i- $ 7.00 
e xe £ 5.00 


FUNDAMENTALS OF STATISTICS 
VOLUME ONE 


( Sixth Revised Edition ) 
—A.M. Goon, M, K. Gupta and 
B. Dasgupta 


OO 
Statisticians generally recognise a division of 


their field into two broad areas, viz. statistical 
- theory ( or mathematical statistics ) and statis- 
tical methods. Covering a wide ground, the 
two volumes ot Fundamentals of Statistics 
give a systematic and self-contained account 
of the methods. The theoretical principles 
underlying the techniques are nonetheless 
properly stated and, as far as possible, 
mathematical derivations of results are not 
avoided either. Ever since its first appearance 
more than two decaces back, the book has re- 
mained an outstanding work of its kind, and 
it has proved as much useful to Pass and 
Honours students of Statistics as to research 
workers in the various sciences. 
The first three chapters of the present volume 
are devoted to the mathematical tools ( inclu- 
ding mathematical probability) that are 
needed for an understanding of the sub- 
sequent chapters and to numerical mathema- 
tics Chapters 4-20, that form the main body 
of the volume, present a detailed description 
of statistical concepts and techniques of a 
general type. These are characterised by a 
rather elaborate treatment of theoretical 
distributions, or association, of correlation 
and regression, and of the various forms of 
statistical inference (including a discussion 
of non-parametric methods). Copious ex- 
amples and exercises are there, too, for 
helping the reader towards a better under- 
standing of the subject-matter. 
A notable feature is Appendix A, on the 


_ formal law of errors, that has enhanced 


_ the value of the volume. 
As. 70.00 


Funda nenta 7 
of STATISTICS 


VOLUME ONE 


A. M. GOON 
North-Eastern Hill University, Shitlong 


M. K. GUPTA 
Kalyani University, Kalyani 
B. DASGUPTA 
Professor B Head of the Department of Statistics, 
Presidency College, Calcutta 


p: i CALCUTTA 
f THE WORLD PRESS PRIVATE LTD. 


” 
W5) 


i 
© Corynicnr 1962, ’63, '68, 71,75, ’83 Tax Wort 
37A CoLLEGE STREET, CALCUTTA- 


s PRIVATE LID., 
S , 


Atindra Mohan Goon, 1931— 
Milan Kumar Gupta, 1932— 
Bhagabat Dasgupta, 1933-— 


j5: 5 9) est henga 


First Edition : February 1962 
Second Edition : September 1963 
Third Edition: January 1968 a 
Fourth Edition : March 1971 
Fifth Edition : February 1975 
Reprinted : 1979 
Reprinted ; 1982 
Sixth Revised Edition : July 1983 
Reprinted : ` 1987 


. 


Previsnrp py S. BuaTtacwausr® FOR Tne Wortp Press Private LTD., 
37A Corrror Srrerr, Carcurra-700073, Comrosep sy Boox 
Courosition Sexvicn, 8 Narex Sew Square, CALCUTTA- 7 
a axp Prixten BY P. K, BHATTACHARJEE AT ALAKANANDA è 
Press, 9 Awtnosy Bacax Lax, Carcerra-700009 


+ TO OUR TEACHERS 
IN THE DEPART MENTS OF STATISTICS OF 
PRESIDENCY COLLEGE AND CALCUTTA UNIVERSITY 


s 
i 
4 


“From time immemorial men must have been compiling » 
information for peace and war. Statistics is in this sense as 
old as statecraft.” 

“Statistics, like engineering, requires all the help it can 
receive from mathematics ; but... (statistics) can never become 
a branch of mathematics.” 


P. C. Mahalanobis 


i 


PREFACE 


Until the mid-1960s, students of statistics had to depend very 
heavily on class lectures ; the more serious ones had to supplement 
the knowledge gathered from this source through the reading of books 
of specialised types. To undergraduate students (Pass and Honours) 
in Indian universities, this situation, arising out of a lack of text 
books that would deal with the subject in a comprehensive manner, 
had been a serious handicap, It was in an attempt to fill this 
lacuna that we started out to write Fundamentals of Statistics more 
than twenty years ago. The book was well received by students and 
teachers alike, the first edition going out of print in less than a year, 
Although there has since been a spate of similar publications, both 
Indian and foreign, the popularity of the book remains undiminished, 
Apparently, Indian students find the book a reliable guide, written 
in a language that is intelligible to them and based on examples 
that have a ready appeal, being mainly taken from the Indian 
context. It has meanwhile gone through five editions, the fifth 
having been reprinted twice, and has grown much bigger in size. 

In undertaking this sixth edition of Volume One of Fundamentals, 
we have again subjected it to a thorough revision, Quite a number 
of new topics have been added, together with a number of new 
examples and exercises. Among these, much of the material 
covered in Chapter 4, some of the material included in Chapter 17 
and most of what is included in Chapter 18 deserve special mention, 
At the same time, mistakes detected in the fifth edition, by fellow 
teachers or by ourselves, have been removed and slight changes 
made here and there so as to improve tbe exposition. 

Fundamentals of Statistics, we should again point out, is a book 
of statistical methods (as distinct from statistical theory or mathematical 
statistics). But perhaps in quite a few respects it can claim some 
distinction among the numerous books of its kind that are now 
available in the market, For one thing, it is far from being a 
cook-book of statistics. The theoretical principles underlying the 
methods are discussed in some detail and, as far as possible, 


vii 


viii PREFACE 


mathematical derivations of the results are supplied or indicated. 
Apart from meeting the requirements of university curricula, this is 
meant to enable the practitioner to go about his job with confidence, 
using the right—or a reasonably good—tool on every occasion. 

Volume One of Fundamentals is concerned with the essential 
mathematical tools, including the numerical techniques of mathe- 
matics and the elements of probability theory, and with statistical 
methods of a general type. Volume Two, on the other hand, covers 
the important areas of analysis of variance and designs, of both 
experiments and sample surveys, and statistical methods that 
are meant for some special fields of application, viz. demography, 
psychology and education, economics and industrial quality control. 

Volume One starts with a brief survey (in Chapter 1) of the 
concepts of algebra (including matrix algebra) and calculus that 
are essential for an understanding of the subsequent chapters. 
The mathematical bases of the many approximations that one 
has to make while applying statistical methods to real problems—in 
terms of interpolation, numerical differentiation and integration, 
and numerical solution of equations—are dealt with in Chapter 2. 
The same chapter includes a section on the common types of error 
that arise in numerical work. In Chapter 3 are presented the basic 
ideas and results of probability theory. For the sake of simplicity, 
it is the case of a finite sample space with equally likely elements 
that has been adequately treated here, but the mode of extension 
to the general case (via Kolmogorov’s axiomatic approach) is also 
indicated. 

Part Two of the volume, which deals with statistics proper, 
constitutes the main body of the volume. It starts (in Chapter 4) 
with an introduction to statistical methods, where the nature of the 
subject and its possible uses are indicated and a historical note is 
given on the origin and development of statistical ideas. Together 
with methods of collection and presentation of data, the basic 
concepts of frequency distribution, central tendency, dispersion, 
etc., are discussed in Chapters 5-9. The concepts are elaborated 
here by keeping in view mainly sample data, but actually the 
distinction between a population and a sample from that population 
—not very important at this stage—has been deliberately kept in 


Da 


PREFACE ix 


the background. Chapter 10 gives numerous (univariate) theoretical 
distributions, including those of the Pearsonian system, which are 
employed as simplified models of distributions encountered in real 
life. Chapters 11-14 are concerned with association of attributes 
and that of variables, together with such topics as rank correlation 
and intra-class correlation. (The sections on tetrachoric correlation 
and biserial correlation are new additions to the volume.) A 
notable feature here is Section 11.7, on the possible causal relation- 
ship between smoking and lung cancer, where the logical aspects 
of the so-called ‘cancer controversy’ are highlighted. The last six 
chapters of the volume are devoted to random sampling and 
Statistical inference. In Chapter 15, the idea of random sampling 
is introduced and some important sampling distributions (under 
conditions of random sampling) are derived. The basic principles 
of point estimation, interval estimation and hypothesis-testing are 
the subject-matter of Chapter 16. In the next two chapters, these 
principles are used to develop tests and estimates for some typical 


_problems, the normal distribution set-up being treated at length. 


The section on the Fisher-Behrens problem is a new addition, 
and so is a large part of the material included under regression 
problems. Most of Chapter 17, viz. the sections concerned with 
tests for homogeneity of several variances, tests and estimates in 
multiple regression analysis, tests for outliers, combination of tests 
and tests for normality, is also new. While Chapters 17 and 18 
present methods that are applicable to samples of any sizes what- 
sover, those which are approximate and applicable only to large 
samples, but .are“simpler, are discussed in Chapter 19. Chapter 20 
deals with non-parametric methods, that are playing a very 
important rôle in preseiit-day statistical practice. Two notable 
additions here are the sections on corner test for association and 
Kolmogorov-Smirnov tests. 

Some readers are likely to complain that multivariate analysis 
has not received in the book its due share. We have refrained from 
including this topic lest the volume should become too unwieldly 
and would just refer the interested reader to M. G. Kendall’s 
monograph on multivariate analysis and the relevant parts of 
C. R., Rao’s Advanced Statistical Methods in Biometric Research. 


x PREFAON 


We have drawn. upon our fairly long experience as teachers of 
Statistics not only in regard to the exposition but also in formulating 
the numerous examples and exercises included in the book. ‘These 
should help the reader towards a better grasp of the subject-matter 
and to readily adapt the method used in solving any problem to 
other problems of a similar type. Typical essay-type questions are 
there, too, mainly to help the student prepare for examinations, 
For working out the numerical problems, the use of an ordinary 
electronic pocket calculator will be found adequate. We would 
also advise the more serious reader to familiarise himself with the 
contents of the following publications, to which occasional references 
will be made in the text : 
[1] Biometrika Tables, Vol. I, by E.S, Pearson and H. O. Hartley, 
Cambridge University Press, 1966. 

[2] Formule and Tables for Statistical Work, by C: R. Rao, 
S. K. Mitra, A. Mathai and K. G. Ramamurthy. Statistical 
Publishing Society, 1975, 

We claim no great originality either in the choice or in the 
presentation of material. On the contrary, we shall deem our 
efforts amply rewarded if we have been able to give in this book an 
intelligible, systematic and reasonably accurate account of the 
principles and procedures of statistics, In carrying out the successive 
revisions, too, our sole aim has been to make the book increasingly 
useful for students and professional statisticians as well as for 
research workers, who are relying more and more on statistical 
methods in their investigations. 

We should put on record the encouragement and suggestions that 
we have received from our teachers, Professors A. Bhattacharyya, 
P. K. Bose, B. N. Ghosh, H. K. Nandi and P: K. Banerji, and from 
numerous friends and colleagues. Our friend Professor R. Datta 
continues to be a source of inspiration, We also take this oppor- 
tunity .to acknowledge once again our debt of gratitude to the 
publishers, the World Press, for the excellent co-operation that 
we have been getting from them. 


Presidency College, Caleutta Taz Avrnors 
June 1983 


CONTENTS 


CHAPTER PAGES 
Part One: MATHEMATICAL PRELIMINARIES 1—138 
1 SOME USEFUL CONCEPTS AND RESULTS 3—34 


la Algebra: Sum and product notations. Sets: Real-number_ 
system. Sequence, series and their convergence. Binomial series. 
Exponential and logarithmic series, Some important algebraic 
inequalites. Matrices. Vectors. Determinants. Inverse matrix. 
Quadratic forms. Linear equations. 
lb Calculus: Concept of a function, Limit of a function. 
Meaning of infinity. General results on limits, Som= important 
limits. Continuity. Derivative of a function, Partial derivatives. 
Application of derivatives of functions. Definite integral, Infinite 
or improper integral. Indefinite integral. Double and multiple 
integrals. Some special integrals. 
lc Stirling's approximation 

2 NUMERICAL ANALYSIS 
2a Inaccuracies and approximations: Different types of inaccu- 
racies. Rounding off. Significant figures. Absolute, felative and 
percentage errors. 
2b Interpolation: The problem of interpolation. Finite differences. 
Error in a tabular value. Use of the operators A, E. Newton's forward 
interpolation formula. Newton’s backward interpolation formula. 
Lagrange’s interpolation formula. Divided differences. Newton’s 
divided difference formula, Central difference formulz : Newton- 
Gauss forward and backward, Stirling’s, Bessel’s and Laplace- 
Everett. Remainder terms in interpolation formulæ. Bivariate 
interpolation. 
2c Numerical differentiation. 
2d Numerical integration: Trapezoidal rule. Simpson’s one third 
rule, Weddle’s rule. Relative accuracy of quadrature formule. 
Euler-Maclaurin formula. 
2e Numerical solution of equations: Method of false position. 
Newton-Raplison method. Method of iteration. Convergence of the 
iteration method. Convergence of the Newton-Raphson method. 
Horner’s method. 

3 ELEMENTS OF PROBABILITY THEORY 98—138 
Meaning of probability. Notation and terminology. Classical defini- 
tion of probability. Theorems on the probability of a union of 
events. Conditional probability. Statistical independence of events. 


xi 


35—97 


xii CONTENTS 


CHAPTER $ Paces 
Limitations of the classical definition, An axiomatic approach, 
` Random variable, and its expectation and variance. Joint distri- 
bution of two random variables. Law of large numbers. 


Part Two: GENERAL STATISTI ETHODS 139—612 
a 
4 INTRODUCTION TO STATISTICAL METH 141—155 


What is statistics? Is statistics a science ? Statistics in matters of 
state, Statistics in business and commerce. Statistics in agriculture. 
Statistics in the sciences. A historical note. Statistics in India. 


5 COLLECTION AND PRESENTATION OF DATA 156--181 
Primary data and secondary data. Collection of data. Scrutiny of 


data. Frequency data and non frequency data. Textual and 
tabular presentation of data. Diagrammatic representation of data. 

6 FREQUENCY DISTRIBUTIONS 182—208 
Summarisation of data. Attribute and variable. Frequercy distri- 
bution of an attribute. Discrete and continuous variablee, 
Frequency distribution of a variable. Graphical representation of 


the frequency distribution of a variable. Frequency curve. Types 
of distribution. 


7 MEASURES OF CENTRAL TENDENCY 204—223 


Descriptive measures of statistics, Central tendency. Arithmetic 
mean, Median. Mode. Comparison of mean, median and mode. 
Other measures of central tendency, 


8 MEASURES OF DISPERSION 224—241 


Meaning of dispersion. Range. Mean deviation. Standard 
deviation. Comparison of range. mean deviation and standard 
deviation. Measures based on mutual differences of observations. 
Quartile deviation. Measures of relative dispersion. Curve of 
concentration. r 


9 MOMENTS AND MEASURES OF SKEWNESS AND 
KURTOSIS 242—259 


Moments. Central moments expressed in terms of moments about 
an arbitrary origin. : Moments about an arbitrary origin expressed 
in terms of central moments. Sheppard’s corrections for moments. 
Skewness. Kurtosis, 


10 UNIVARIATE THEORETICAL DISTRIBUTIONS 260—307 


Introduction. Probability mass and probability density functions. 
Characteristics of a theoretical distribution, The uypergeometric 
distribution. The binomial distribution, Moments of the binomial 
distribution. A recursion relation for moments of the binomial 
distribution. Fitting a binomial distribution towan observed distri- 
bution. The Poisson distribution. Moments of the Poisson distri- 


CONTENTS 5 xv 
a . 


B STATISTICAL TABLES 


I 
ll 


Vil 
VIH 
IX 
x 


à >" 622—635 
Ordinates and areas of the distribution of normal deviate. p 


Distribution of normal deviate : Values of 7 
X*-distribution : Values of X*,_,. 
f-distribution : Values of lay- 
F-distribution : Values of Fa 


AETAT 


Cumulative binomial probabilities of y 
independent trials with p=0.5. 
Critical values of T in the Wilcoxon signed-rank tesi. 

Critical values of U for the Mann-Whitney test, 

Critical values of 7 in the run test. 

Critical values of r,. the Spearman rank correlation coefficient, 


INDEX 


or fewer successes in n 


637—614 


PART ONE 


MATHEMATICAL PRELIMINARIES 


ji SOME USEFUL 
CONCEPTS AND RESULTS 


In this chapter, we present some concepts and results of algebra 
and calculus which will be frequently used in subsequent chapters. 


la ALGEBRA 


la.l1 Sum and product notations 

To simplify the writing of the sum or product of n quantities we 
shall make use of two signs, which we explain below. 

The Greek letter J, (capital ‘sigma’) is used as a sign of 


summation, If we have n quantities x,, Xa ---.-- s Xn, their sum is 
denoted by 
a 
DET ssa m 
i=1 
or by Šri eee (l.la) 


When the range of values of i is evident from the context, one 
may write, simply, T 
Fi 


na 
> x; is read as ‘sigma x; for i from 1 to n’. 
dma 


> follows certain simple rules which can be easily verified : 
(i) Z(t W=ZHtEEH 3 
(ii) Ykx;=kDx;, k being a constant ; 
i i 4 
Gii) B+) =nk+ Zxr 


Suppose we have mX n quantities arranged in m rows and n 
columns as follows : 


Sa neee ie 
Xal Xog sessss Xon 
Zal Sage Kis 


4 FUNDAMENTALS OF STATISTIOS 


Here x, is the quantity occurring both in the ith row and in the jth 
column, The use of two suffixes, i and j, provides a convenient way of 
representing the position of a quantity in the above arrangement. 
The sum of all the mXn quantities may be obtained by first 
adding the quantities in each row and then adding the row totals : 


(yy igt------ Frin) + (žan tit- Hrg) Hee 


HR Bayt ssssss + ni 
=2(2%i)- 
ii 
This grand total may also be obtained by first adding the 


quantities in each column and then adding the column totals : 
+ (ntate +X my) + (tittet mg) Heee 


Es Snt Bit seeees HÈtin 
=2(4i;) 
4 fins 
* Denoting the grand total by 
Zii 


we have thus 


Ery= X(T xis) 
bj (en 
=Z ti) seti (162) 


each of which is a double sum. 

‘When we have more than two ways of classification, we shall 
have to use more than two suffixes in order to indicate the position 
of any quantity, The grand total will then be represented as 


2 Mikoub di (18) 


which may be obtained by summing the quantities with respect to 
the suffixes, taken one by one in any order we like. 

The Greek letter JĮ (capital ‘pi’) is used as a sign of multi- 
plication. The product of the n quantities x, xp ...... (ty W 


SOME USEFUL OONCEPTS AND RESULTS 5 


denoted by 
5 a (14) 
Ha (1.4) 

or We vs (14a) 
1 

or, simply, Ts: «+ (1.4b) 


n 
ib: is read as ‘product x; for i from 1 to n’. 
=1 


Like J, the product sign also obeys certain simple rules. For 


example, 
Ms) = (s) T») 
x 
Tela “io 
T=) =H) 
etc. a 
la.2 Sets 


In mathematics, we deal with collections of ohjects that haves 
some common quality or feature, and we refer to such a collection. 
as a set. The objects in a set are called the elements or members of 
the set. A State Legislative Assembly is a set whose elements are 
the MLAs. A set can be subdivided into smaller sets called subsets. 
Some examples of subsets are the set of Congress MLAs, the set of 
Communist MLAs and the set of female MLAs. Ifeach element of 
a set B is also an element of a set A, then B is called a subset of A. 
If there is an element of A which is not in B, then B is calleda 
proper subset of A. It is possible that a set may contain no elements ; 
then the set is called an empty set or a null set. For example, if 
there be no female MLAs in the State Legislature, then the subset of 
female MLAs is an empty set. 


la.3 Real-number system 
We use the real=number system in elementary algebra, A short 


account of this system is given below. ; 
The first step was the invention of the set of counting numbers. 


The symbols for the counting numbers in most countries are 


6 FUNDAMENTALS OF STATISTIOS 


1, 2, 3, 4,....... They answer the question “How many ?” 
Another name for a counting number is positive integer. We count 
the elements of a finite set by corresponding one and only one 
element of the set with one and only one of the positive integers 
Mg 8), acct » Starting with 1 and continuing till each element of 
the set has been exhausted. 

From this simple beginning, one can expand the concept of 
number by introducing sophisticated ideas. Thus the need for a 
directed number to denote distance along either direction from a 
fixed point on a line suggests the number 0 and the negative integers 
= oe A i 

If a system of numbers is to be useful, then it must have the 
property of closure under the four fundamental operations of addition, 
subtraction, multiplication and division. A set of elements is said 
to be closed under an operation if after the operation on the 
elements of the set, the resulting element is also a member of the set. 

“It is easy to verify that the set of positive integers is closed under 
addition and multiplication, but not under subtraction. If a is a 
positive integer, then 0 and —a may be defined as follows : 
0+a=a, a+(—a)=0. 

We now define the set of integers to be the set of numbers composed 
of the positive integers, zero and the negative integers. The intro- 
duction of 0 and the set of negative integers makes the set of integers 
closed under addition, subtraction and multiplication, 

The operation of division requires an extension of our set of 
integers, and this is done by defining a set of rational numbers. A 
quotient or fraction is a ratio of integers where the denominator is 
different from zero. The set of rational numbers is the set of 
numbers each of which can be expressed as the quotient of two 
integers. The set of rational numbers is closed under addition, 
subtraction, multiplication and division. 

If a rational number is converted into decimal form, one of two 
things will happen : 

(i) The decimal terminates (e.g. 3, 0-24). This group contains 
all the integers and some rational numbers. 

(ii) The decimal does not terminate but repeats (c.g. 413, 
0041321 3). This group contains the remaining rational numbers. 


SOMS USEFUL CONCEPTS AND RESULTS 7 


An irrational number is a non-terminating, non-periodic decimal, 
e.g. V2=1-4142......, V18=4-24264....... Apparently, the digits can 
be continued endlessly but no sequence of them will repeat. Not all 
irrational numbers can be obtained as the result of an algebraic 
operation, and an irrational number which is not algebraic is said to 
be transcendental. Typical examples are the base of natural logarithms 
(e) and the ratio of the diameter to the circumference of a circle (7). 

We now define the set of real numbers to be the set of numbers 
consisting of the set of rational numbers and the set of irrational 
numbers. The set of real numbers is closed under the four funda- 
mental operations. The relations between various subsets of the set 
of real numbers will be clear from the following chart ;/ 


positive integers 
integers zero 


rational numbers { negative integers 


real numbers fractions 


irrational numbers e 


1a.4 Sequence, series and their convergence 

A sequence is a set of quantities sı 5g, «+--+ , called elements, 
which can be arranged in an order, so that when n is given, the nth 
clement of the sequence is completely specified. The elements are 
arranged by matching them one by one with the positive integers 
| Ae ne, U Ara . A common symbol for a sequence is {s,}. 

A sequence {s„} is said to be bounded if there exists an arbitrary 
positive number M such that |s | <M, for all n. Otherwise, the 
sequence is said to be unbounded. If there exists an / such that for 
every choice of «>0, there is an N such that |s,—/| <e for all n>N, 
then the sequence {s,} is convergent and has the limit |. It is custo- 
mary to represent this by lim s„=l. We shall see more of limits in _ 


Section 1b.2. A sequence {sn} which is not convergent is called 
divergent. For example, the sequence of integers {n} is divergent. 

A series is an expression of the form a,+-ag-+..--+ Fapte 
which may have a finite or an infinite number of terms. Its partial 
SUMS, 540, =l ip ee s Sq lj sses , constitute a sequence 


1 
{s,}- A series is said to be convergent if its sequence of partial sums 


8 FUNDAMENTALS OF STATISTIOS 


is convergent. If lim s,—s, then s is the sum of the convergent 
n> 


infinite series and A If lim s,=- 00, then the series is said 
i 1—>n 


to be divergent to +00. A series is said to oscillate if the sequence of 
Partial sums does not tend to a definite limit but oscillates between 
two values, say, m and M. A simple example of oscillating series 
is L—141—1+44....; +(—1)* 7+... . This oscillates between 0 
and l: A series is absolutely convergent if the series of corresponding 
absolute values, |a,|-+|a9|+---....+|@,|-+...-; is convergent. A 
series such as 1—$+4-f+..... » which is convergent but not 
absolutely convergent, is said to be conditionally convergent. Convergent 
series are usually the most useful type for practical applications ; 
hence it is of importance to test a series for convergence. The 
comparison test and the ratio test (due to d'Alembert) are most 
often used to test convergence of a series. 

It can be seen that an absolutely convergent series is necessarily 
convergent. The merit of an absolutely convergent series is that 
any rearrangement of its terms leaves its absolute convergence 
intact and the sum of the new series also remains the same. 


la.5 Binomial series 
A binomial is an expression that contains exactly two terms. In 

this section, we first give a formula which enables us to express any 
positive integral power of a binomial. By actual multiplication, we 
obtain the following expansions : 

(x+y)'=x+-9, 

(+y) =x + 2xy+y?, 

(x+y) 8x34 3x2y 4 Bxy?y’, 


ee general formula is 
(*+3)"= ()e"+ (t)="-42+ (2) + EA 
+ (t)et-ry"+ Jones +e sse (1.5) 


which holds for positive integral values of n.* 
+(*) stands for the expression n(n—1)...(n—x+1)/x!, where slex(x— 1)... 2x1. 
A similar symbol is (n),.=n(n—1)......(n—x4-1), 


SOME USEFUL CONOEPTS AND RESULTS 9 


When n is not a positive integer (i.e. is negative or fractional), 
some restriction has to be put on x and y. In this case, if | »|<jx|, 
where |k|, called ‘modulus i’, denotes the numerical or absolute 
value of k, then one can write 


(x+9)"= o ten yt (a fe 


+j wor ad inf... (1.6) 


(1.6) may be looked upon as the most general form of the binomial 
series, which includes (1.5) as a particular case. 
Two important results, which follow from (1.6), are 
NEEE Pee Let E (1)? beeen: 
and (1—«)~1=]4+x+474+...... Ex" Po. ee { 
which are valid for |x| < 1. 


la.6 Exponential and logarithmic series 
The series 


Lt ytay ten tatees HAES 


is denoted by the letter e. It can be shown that the xth power of 
any positive number a is 


2 2 r r 
arat pge Alog a, 4log NEIN (1.8) 


This is the most general form of the exponential series, As a 
Particular case, we have, by putting a=e, 


walt 


‘The logarithmic series, which gives the expansion of log, (1-+-x) in 
ascending powers of x, is 


log,(1+x)=—$-+4— ey ‘ ve (1:10) 


x? r 
tate te PR ee ES: 


From this, we have also 


2 
log, (1—2)=—*-F—5 —. 1 x «av AAD 
Both (1.10) and (1.11) are valid for |x] <1. 


10 FUNDAMENTALS OF STATISTICS 


ia.7 Some important algebraic inequalities 
(i) Ifaand b be any two numbers, then 


ja+b|<|a|+|>}- 
Proof: Whatever the signs of a and b, we shall have 
—lal<a<|a| 
and —|b|<6<[4],° 
so that —|ja|—|b|<a+b<|al+|4], 
ie. Ja+b|<|a|+|4|. stv. (ek?) 
(ii) Let xis žo +... , x, be a set positive quantities and suppose 
A tyt tat -e +x, 
n ’ 
G= (xfi) T i 
and E PE E IREEN 
Aane 
Xa že n 
Then 
Az G>H. 


Proof: Since the square of a real quantity is necessarily non- 
negative, 
(Vx — Vra) 0 
so that 


x +%, ——— 
4 p > xix 


From this again, we have 


xy +X 4 kath y 
2 bey tat te y V xix V xata 

Se ee Oe aaa he ? 

ie. atatatu< (xyxpxgx,) "4. 


Proceeding in this way, we can prove the result A > G for n=2", 
where m is a positive integer. It will now be shown that the 
inequality also holds when n is not of the form 2”. 


SOME USEFUL CONCEPTS AND RESULTS lì 


Let us take the smallest m such that n<2". Then, along with 
e ee 3 Ša, WE may consider 2"—n other positive quantities, 
Xap Xapo eee Kom, 


all equal to A. 
According to the result eed above, 


Hy bry uceeee +x,+ ofr paias +x, 12” 
Se eee > (amn Rake: e TRE Xn) $ 
or Zitat. eelde S a E e aa xad") a > 
on Arts G ar 1/2 ty Ps ai 
or ">G", or A>G. ee (1-78) 
Since 5 e opie oe are themselves positive quantities, we 
% žo Xn 
have also 
1 1 
AEE ee Sa 
XX, Xn t Li i 1 ijn 
a >(-*,.* ests x=) F 
r uae 
1e. Eg 
so that G>H. ve 014 
Combining (1:13) and (1.14), we can write 
A>G>H. ve (1.15) 
The equality signs hold when all the n quantities are equal. 
(iii) Tf ayy xg, eee Hq aNd Jy, yg, «+--+ :Jn be two sets of real 
numbers, then 
(Hra twee ban?) (gta H +957) 2 
(Pi Froito an ya) 
Proof: We first note that if x,, xp, «+---- , X, are all equal to 
zero and/or yy, Jo =e +3» are all equal to zero, the inequality 


is trivially true, both sides being equal to zero. So we assume, 
without any loss of generality, that there is at least one non-zero 
value in the set of x’s as well as in the set of »’s. 


12 BUNDAMENTALS OF STATISTIOS 


Let us denote by A, B and C, respectively, the sums 


and xit tee +xnIn- 
From the familiar inequality a?-+b? > 2|ab|, we have 


mte > olen, 


x" Hs Se > ae, 


Xn In? ol¥n Jal 
re > Qgiszal, 
sE V AB 
Adding the above n inequalities, we get 


ai etxa? TO an AE tye Polt + |xeye] t.t =a 
B 


VAB 
polis Fat eee +%n Jal 
VAB 
ie. glanta nttt l, 
VAB 
Hence 
ABD (xy Jitte Jate ++%_ Ja) =C, 
ie (PHH Hrn AHI H tn) 
(x. Jit Xe Jat PENR Fra Ja)? at (1.16) 
This is known as the Cauchy-Schwarz inequality. The equality sign holds 
when x;=ky; (or y;=kax;), k being a constant, for i=1, 2, ...... nm. 


la.8 Matrices 
A matrix A of order mXn is a rectangular array of numbers or 
elements arranged in m rows and n columns as 


Ayy  Ayg ovevee An 
Au ân Ogg «sree Gen F 
Gay Amey ose Bik 


where the elements are represented by aj. The subscripts i and j 


t 


SOME USEFUL CONCEPTS AND RESULTS 13 


refer, respectively, to the row number and the Column number of the 
element in the array. To emphasize the order, we may write A... 
instead of A. We shall assume that the elements aj; are real numbers. 


For example, 
2 0 
E 
zgi sr! 


is a 3x2 matrix. The matrix is nothing more than a set of elements 
arranged in a convenient manner, For example, the prices of m 
commodities for n periods of time may be represented by an mxn 
matrix, where the element aj; is the price of the ith commodity for 
the jth period. 

Two matrices A=(a,;) and B=(b;;) are called equal, and one 
writes A=B, if and only if (1) they are of the same order and (2) ayy 
=b;; for all i and j. 

A matrix with the same number of rows as of columns is called a 
square matrix, The elements ay of a square matrix form the main 
diagonal of the matrix. A square matrix is symmetric if a,,—a;; for all i 


andj. For example, 
abo 
(: c 1) 
odo 


is a symmetric matrix of order 3. 
We shall now define the operations of addition, subtraction and 


muitiplication with matrices, 
(i) The product of a matrix and a real number (scalar) c is 


defined as follows ; 

cA=c(a;;)=(ca;;), 
i.e. a matrix each element of which is c times the corresponding 
clement of A. We take, by convention, cA=Ac. (—1)A is written 


as —A. 
(ii) The sum of A and B is defined only when they are of the 


same order, and then 
A+B= (aij) + (bij) = (aF bi). 
Thus the sum is again a matrix of the same order obtained by adding 
; ’ 


14 FUNDAMENTALS OF STATISTIOS 


corresponding elements of the two matrices. Similarly, we define 
the difference A—B of the two matrices of the same order as 

A—B= (a,j) — (ba (tij — bij)» 
i.e. as a matrix of the same order with elements aj—b; (Note that 
B—A will be different from A—B unless A=B.) 

(iii) The product of two matrices is defined only when the number 
of columns of the‘first matrix is the same as the number of rows of 
the second. The product of Amx and B,,,, is a matrix C,,,,, where 
the elements of C are given by 


G= Mik bkj- 
‘Thus 
Aax, By xa= (a4) (bi) 


= (22 bas) =C0nyn 


It is to be noted that B,,,A,,,, is not defined unless m=n. Also, 
even for square matrices of the same order, AB need not be the same 
as BA. 

These operations can be defined easily, step by step, for more 
than two matrices. 

All the matrix operations, defined so far, are associative and 
distributive. Except for subtraction and the third operation, they are 
also commutative. 

The transpose, A’, of a matrix A is A with its rows and columns 
interchanged. ‘Ihus if Ajyx=(ajj), then A’=A’q4m=(ajj), where 
ajj=a, for alli andj. It is easy to verify that 

(A')'=A, (A4B)'=A'4B’, (AB)'=B'A'. 
Thus a square matrix is symmetric if it is equal to its transpose. 

A diagonal matrix is a square matrix with all its non-diagonal 
elements equal to zero, and hence it is symmetric too, 

A unit matrix (or identily matrix) I is a. diagonal matrix with all 
diagonal elements equal tol, Thus 

1=(8,); 
where 5,;=1 and 8,;=0 for ij. 

It is easy to verify that 


TexeAnxe=Anyelace =A,,0° 


SOME USEFUL CONCEPTS AND RESULTS 15 


A null matrix isa matrix with all elements equal to zero and is 
denoted by O. Obviously, we have 


Onn cArxs=Op,, 


and Anxat Ones Agye> 
A square matrix C is called an orthogonal matrix if it is such that 
C’'C=CC'=1. 


1a.9 Vectors 

A vector is an ordered set of numbers arranged in one row or one 
column. A row vector of order n is a matrix of order 1 xn, while a 
column vector of order n is a matrix of order nx 1. The transpose of a 
row vector is a column vector, and conversely. 

We shall use a symbol like x to denote a column vector and a 
symbol like x’ to denote a row vector. 

Since a vector is a special type of matrix, the operations of 
addition, subtraction and multiplication are also defined for vectors’ 
with the same type of restrictions on the orders of the vectors. Ifx 
and y are two column vectors of order n, i.e. if 


i Jı 
x=| % |, y=| 2 |, 
P In 
then xx= Jx, x'y=y'x= Pie 
i 


A set of vectors, X4, Xg, .-+-..) X,, Will be said to be linearly dependent 

if there exist numbers Cis gy ++++++) Cp, NOt all equal to zero, such that 
6X1 +X_+-......+¢,X,=0, l 

where 0 is the null vector (i.e. the vector with all elements equal to 
zero) of the same order. Clearly, if the set includes any null vector, 
then it is necessarily linearly dependent. 

The vectors will be said to be linearly independent if there do not 
exist ¢;, not all equal to zero, such that the above relation is true. 


la.10 Determinants 
To every square matrix A,,,= (a,;), there corresponds a number 
A known as the determinant of the matrix A. It is also denoted by 


16 FUNDAMENTALS OF STATISTIOS 


[Aj or Ja;| or 


AART as. 
where aj; is the element of the ith row and the jth column. It is 
defined as the surn of n! terms of the form 


haya Agg--+essAnyy see (1.17) 
there being one term corresponding to each permutation shi... v) 
of the numbers 1, 2, ......, n. The sign of a term is determined 


according to the following rule : 
If the number of inversions in (æ f......v) is even (including zero), 
then the term 


will have a plus sign. If, on the other hand, the number of inver- 
` sions is odd, the term will have a minus sign. 
(When two numbers in a permutation are not in natural order, 
the greater number preceding the smaller like the 4 and 3 in [1243]; 
or the 5 and 2 in [15342], they are said to give an inversion. The 
total number of inversions is 1 in the former permutation and 5 in 
the latter.) 
The minor of an element a; in A is defined to be the determinant 
(of order n—1) formed by deleting from A the ith row and the jth 
column, The co-factor of a,; is 


(—1)!*/ times its minor 


and is generally denoted by Aj; For instance, in the present case 


To evaluate a determinant one may use the relation : 
* 
A= Yay Ay 
È (for any i=1, 2, ......,m), 1.14) 


which expresses A in terms of the elements of the ith row ani the 
corresponding co-factors. The other type of relation, which expresses 


SOME USEFUL CONOEPTS AND RESULTS 17 


A in terms of the elements of the jth column and the corresponding 
co-factors, is 
A= Dai; Ay (for any j=l, 2 osetee , n). ave (1.19) 
i=. ' 


We state below a useful property of a determinant ; 
Xaj;Ag;=0 for ik \ 
j 


1.20 
3 ZajAn=0 for j#k. ie 
i 


It is easy to verify that 
(i) the effect of multiplying the elements of a row (or a` 
column) of a determinant by a number ¢ is to multiply the deter- ' 
minant by c ; 
(ii) the value of a determinant remains unchanged if its rows 
are changed into columns (and vice versa) ; 
(iii) the effect of interchanging two rows or two columns of a 
determinant is to change its sign ; 5 
(iv) a determinant with two identical rows or columns is zero. 
When A is not a square matrix, the determinant of any square 
sub-matrix of A is called a minor of A. 
The rank of a matrix A is the greatest integer r such that A has 
at least one minor of order r which is different from zero. 


la.11 Inverse matrix 
A square matrix A,,, with |A|#0 is called a non-singular matrix, 
Anyn then has rank n. A,,, is called a singular matrix if |A n,n | =0. 
Non-singular matrices have corresponding inverse matrices. When 
A is non-singular, the matrix A~, defined by 


Aj 
ET (Ai AR B 
A =( ), (1.21) 
is called the’inverse or reciprocal of A. Here Ai is the element of the 


ith row and jth column of A~!. Usually, < Aes is denoted by a” and 


A~! is then represented by (a4). Two coca properties of inverse 


matrices are : 
(i) |A~|=1/]A], and (ii) if A is symmetric, then so is A“. 
For an orthogonal matrix C, we have C-?=C’, 


Fa (1)—2 


18 FUNDAMENTALS OF STATISTIOS 


la.12 Quadratic forms 
The expression 


a n 
È Layrixp ero (1:22) 

i=} j=1 
where one may take a;;=a;;, is called a quadratic form in the variables 
Ai Xa k ,x,- In matrix notation, it is x'Ax and is denoted by 
Q(X) or Q (xis X95 --+ ++ ,x,)- A is the matrix of the form Q. The 
rank of a quadratic form is, by definition, the same as the rank of 
tthe associated matrix A. As indicated above, A is taken to be a 

symmetric matrix. If A=I, then 


Q(x)=x'Ix=S x2. 
izi 
If we apply a linear transformation 
m 
x=} byi E PE See >n, 
j=1 


which in matrix notation is x=By, with B=B,„xm, to Q(x), we get 
a new quadratic form Q (y) : 


Q (ži Xo oer s X,) =x'Ax=y’B/ABy=Q (yp, Jas e I 
where B’AB is the matrix of the transformed form and is symmetric. 
If for every set of real values of x4, x9, --»... nee 
Q (ži Xap vere »X_)20, 


then the form is said to be non-negative. If in the above relation 
equality sign holds if and only if x; are all equal to zero, then the 
form is said to be positive definite. A positive semi-definite form is a non- 
negative form which is not positive definite. 

A linear transformation x =Cy, using an orthogonal matrix C, is 
called an orthogonal transformation. Orthogonal transformations are 
frequently used in statistical theory. 


1a.13 Linear equations 
A simultaneous system 
ayti + latat Pest Fainn =k, ) 


ae 3 : : aes (1.23) 

ami tamat FamnX¥n = km J 
of m linear equations in n unknowns %4, Xg; «+ , x, has for a solution 
a set of n quantities hy, hg; ...... , hy if, on substitution in (1.23), it 


reduces the left-hand sides of (1.23) to kp kn +++ aie 


SOME USEFUL CONOEPTS AND RESULTS 19 


In matrix notation, (1.23) can be written as 


Ax=k, we (1.24) 
where - 
Ai jg ee lin 
Àz âz 999 sesos Gon i 
any Gas ees Onn 
xy ky 
x= *2 and k= ky 
$ ii 


When k is the null vector, (1.24) is said to be a homogeneous system, 
and it is called non-homogenecus when k is non-null. 

Let m=n. Consider the non-homogeneous system with A a non- 
singular matrix. Then we obtain the unique solution of the system 
by pre-multiplying both sides of (1.24) by A!, which gives 


x=A“'k 


s 


or, in explicit form, 


aea Sky A EN A we (1.25) 
jal 


lA] 
This is known as Cramer’s rule for solving linear equations. 
In case A is singular or mn, the system will be consistent, i.e. 
will have a solution, if and only if ô 


rank A=rank (A |k), 
where (A |k) is the augmented matrix obtained from A by attaching 


the column of constants as an extra column. A system of homo- 
geneous equations is always consistent, for 


rank A=rank (A|0), 


k being now the null vector. The solution x;=0, im la Zeie FA 
which always exists for a homogeneous system, is called the trivial 
solution. Thus the important question for a homogeneous system is 
the existence of a non-irivial solution. A homogeneous system of 
linear equations has a non-trivial solution if, and only if, rank A<n. 


20 FUNDAMENTALS OF STATISTIOS 
1b CALCULUS 


Tb.1 Concept of a function 
If x and y are two variable quantities such that the value of y 
depends on the value of x, then y is said to be a function of x. 
Consider the following relations between x and yi 
(i) y=2+43x+ 6x, 
(ii) y=4logx+5, 
(iii) y=cosx. y 
In each of the above cases, when x is given y can be determined, 
though in none of these cases we have the individual values of x 
and y. When such is the case, the expressions on the right-hand side 
of the relations are called functions of x. For instance, the value 
of a cube depends on the length of its side : given the length of the 
side, we know the volume exactly. Thus the volume of a cube is a 
function of the length of its side. 9 
The variable x, on whose value » depends, is called the indepen- 
dent variable and y the dependent variable. It is customary to 
indicate that y is a function of x by writing the independent variable 
within brackets and prefixing a letter, e.g. 


y=f(x) or (x) or F(x), etc. 
F(a) denotes the value of y at x=a. 


1b.2 Limit ofa function 

If there is a definite value A to which J(x) approaches as x 
approaches the value a, then A is called the limit of Six) as x tends 
to a or approaches a. This is symbolically denoted by 


Lt f(x)=A or lim  f(x)=4. 


When we say x tends to a, we mean that the difference between 
x and a becomes smaller and smaller and ultimately smaller than 
any given positive quantity, however small. The above definition 
of limit means that the difference between f(x) and A can be 
made arbitrarily small by making the difference between x and a 
sufficiently small. With the help of the modulus notation, this is 
stated as follows : 


b 


SOME USEFUL CONOEPTS AND RESULTS : 21 


If given e>0, however small, a 8 can be found such that * 

i f(*)—A] <e 

o< |x—a] <8, 

then A is said to be the limit of f(x) as x tends to a. 
Consider, for instance, 


ipe (ax+b). 

— 

When x tends to 1, ax tends toa. Thus ax+b tends toa+b. So 
lim (ax+6)=a+b. 
z—>1 


whenever 


1b.3 Meaning of infinity (co) 
The function 1/x is not defined for x=0. But when x(>0) is 
made smaller and smaller, 1/x becomes greater and greater. Thus 


. as x->0 through positive values, 1jx does not tend to any number but 


increases without bound. This phenomenon is expressed as 
lim 1/x=00. 


z->+0 
Similarly, as x->0 taking negative values, 1/x remains a negative 
quantity whose magnitude becomes greater and greater. In symbols, 


lim 1/x=—oo. 
s—>-0 


If f(x) approaches A as x increases without bound, A is said to be 
the limit of f(x) for x tending to oo : 


lim f(x) =A. 


1b.4 General results on limits 

(1) The limit of the sum (or difference) of a finite number of 
functions is equal to the sum (or difference) of the limits of the 
individual functions. For instance, r 

` lim {fy(x)eb f(x) —fal)}=lim f(x) +lim fa(x)—lim a(x). 

(2) The limit of the product of a finite number of functions is 

the product of their limits. For instance, 
lim { f(x) -fa(x)} =lim f(x) -lim f,(x)- 

(3) The limit of the quotient of two functions is the quotient of 

their limits, provided the limit of the denominator does not vanish : 


tim fle) aes provided lim f,(x) 40. 


Libtary 
6C.E.R.Y . West Senge a> 
Data.. k Bs Le i see D4 


tt Wo, x% 


22 FUNDAMENTALS OF STATISTIOS 


1b.5 Some important limits 


1—0 z->0 


(i) lim (14i) =e; G) dim (12) =e 


á . 
X 2 Sra"! (where r is a rational number). 
“>a x—a 


(iii) lim 
si 

1b.6 Continuity 

The function f(x) is said to be continuous at a point x=a if 
lim f(x) exists and equals f(a). Otherwise, J (x) is said to be disconti- 
—da 
nuous ata. The function f(x) is continuous between a and b if it is 
continuous at all points in that interval, 

The function f(x) =x? is a continuous function, while 


x? for x40 
f= { —1 for x=0 
is discontinuous at x=0, since lim f(x) =04f(0): 
*—>0 


1b.7 Derivative of a function 
Given a function 


I=f(x), 


the limiting value 
lim fla+) fla), 


h— 0 h 
when it exists, is called the derivative or the differential coefficient of 
f(x) with respect to x at the pointa. When the limit does not exist, 
J(x) is not differentiable for the value of x. 


Various symbols are used to denote the derivative of J (x) when it 
is itself looked upon as a function of x : 


dy df (x) , 
ae atts) or f(x). 
The derivative of y=x* is 


lim tM) ta east 1, 


h—o 


Again, the derivative of y=e* is 


SOME USEFUL CONCEPTS AND RESULTS 23 


A If f'(x) happens to be a differentiable function, then we can 
differentiate f'(x) again as we did with f(x) and get the derivative 
of f'(x). This is the second derivative or second differential coefficient of 


(x) and is denoted h of’ (*) i i 4g d f(x) 4 
fix) y LE), ie. £ (5 fla), or SES or Se) 
Similarly, we can define higher-order derivatives. 
1b.8 Partial derivatives 

If u=f(x, J, Z) 00+ ) is a function of the variables x, t, Z, s+ $ 


then the limit 
Jls +h, Js Zs -e J= flt, Js Zy eee ) 


lim = 2 ee =, 
h—o h 
if it exists when y, z, =- are regarded as constants, is called the first 


partial derivative of f with respect to x and is denoted by A or fy. 
x 
Similarly, the first partial. derivative of f with respect to y, when the 
other variables are held constant, is denoted by 2f or fy. Similarly 
J 


for the other partial derivatives. The first partial derivatives f’,, 
Fe gs seeore are again functions of x, J, Z, «= , and if these functions 
have further partial derivatives with respect to x, J, «+--+ , we may 
obtain the successive partial derivatives of higher orders, Thus 

a (PAF por’ 

(Ge) ast AAAA 

a (afy _ of _ of _@ B Leg 

a bey as Gy) er 

0 (af\_27F_9 ¢ rn 

bla) aye apt 22 
and so on. 


1b.9 Application of derivatives of functions 

Here we shall consider the behaviour of functions that depends 
on derivatives. In many cases it is possible to locate the maximum, 
the minimum or the point of inflection of a function with the help 
of its derivatives. 

We state below some definitions and results in this regard that 


will be useful in our later statistical work. 


24 FUNDAMENTALS OF STATISTICS 


Definition A function f(x) is said to have a local (or relative} 
maximum at x=c if 


Sl) >f(c+4), 
for sufficiently small values of h (both positive and negative). For a 
local (or relative) minimum at x=c, 


S) <f(c+h) 
for sufficiently small values of 4 (both positive and negative). 
The function f(x) will have an absolute maximum at x=c in an 
interval (a, b) if f(c) > f(x), for all x in (a, $) and not only for x 
close toc. Similarly, there is an absolute minimum at x=c if 


S(¢) <f (x), for all x in (a, b). 


Definition The point of a curve where it changes from concave 
to convex or vice versa is called a point of inflection. 

We state below some rules that are generally applicable in 
locating a stationary value (i.e. a local maximum or a local minimum 
or a point of inflection) at a point x=c. Also, as we are considering 
application of derivatives, we assume that J=f(x) is differentiable 
up to the required order in the interval of interest. 

Rule 1. Test®each value of x for which J'(x)=0 to determine the 


nature of stationary value. The usual tests are : 
(a) If 


f'(x) >0 forx<c à 
=0 for x=ç 
<0 for x ><, 
then a (local) maximum occurs at x—c. If f'(x) changes from nega- 
tive to positive through zero as x advances through ¢, then there is a 
minimum at x=c. If f(x) does not change sign as x passes through 
c, then a maximum or a minimum need not occur at x=c. 
(b) Iff"(c)<0 when f’(c)=0, then a maximum occurs at x=c. 
If f"(c)> 0 when f'(c)=0, then a minimum occurs at x=c. 
If f"(c)=0 when f'(c) =0, the above test fails, But if f"(c) =0 


and /"(x) changes sign as it passes through r=¢, then x<¢ is a point 
of inflection. 


SOME USEFUL OONCHPTS AND RESULTS 25 


(c) Iff'"(c)40 (for n> 2) and all the previous derivatives of 
f(x) vanish at x=<c, then 
J(x) has a maximum at x=c ifn be even and f'")(c) < 0, 
f(x) has a minimum at x=c if n be even and /‘")(c) > 0, 
and f(x) has a point of inflection at x=c if n be odd. 
(d) H the derivative does not exist at a point, then exarfine the 
graph of y=f(x) near that point for a possible-stationary value. 


Lagrange multipliers 

For the case where a stationary value is desired subjec: to certain 
auxiliary conditions, we use a technique known as Lagrange’s method 
of undetermined multipliers. 

Suppose y=f (Xis Xas <e , Xn) is a function of n variables, x,, Xg,” 
ssai s ža, and we want to find a stationary value of f subject to m<n 
auxiliary conditions : $j(x;, #9 «----+ 5 Fa Op pels Sy erie m. 

We use m constants Lj j=1, 2, ..-..- , m, called Lagrange multi- 
pliers and form the function 

m 
Faf+ 2 hit 
If the n first partial derivatives of F are required to’ vanish, i.e. if 
oF 
Oxi 
conditions ¢;=0, j=1, 2, «+ , m, make it possible to determine 
the (m+n) quantities, viz. x) Xg; = ri an See Meroe, s Lm, so that 
f attains a stationary value for the values of #4, x2 «++... s Xn SO 
determined. 

Then to decide on the nature of the stationary value we apply 

appropriate tests. 


=0, i=l, 2,...... „n, then these n equations along with the m 


1b.10 Definite integral 

Let ¢(x) be bounded in the finite interval (a, b). Let the 
interval (a, b) be divided in any manner into n sub-intervals, equal 
or not, of width Ay, hg =e. ,h,. In these sub-intervals, choose 
perfectly arbitrary points, Say fi fg, =. Ta, respectively. If as 
n->co, with the width of the intervals tending to zero, the sum 


hyh (ry) thalia) + beers +had(re) A 
tends to a limit, then that limit is called the definite integral of Xx) 


26 BUNDAMENTALS OF STATISTIOS 


between a and 4 and is dekota by 
b 
f $(x)dx. 
This is read as ‘the pba $(x)dx from gug b. 
Geometrically, it can be seen that f ¢(x)dx is nothing but the 


area bounded by the curve y=¢(x), the x-axis and the two ordinates 
at a and b. i 


1b.11 Infinite or improper integral 
_ When the limits a and/or b are infinite or the integrand ¢(x) has 
infinite discontinuity, i.e. tends to infinity at one or more points 
in the finite range (a, b), the integral is called an infinite or improper 
integral. 33 

When the range is infinite, the integral can be evaluated in the 
following manner : 


© b 
(i) f ¢(x)dx= lim f ¢(x)dx, 6 being any number greater 
5 d ee 
than a; 
b bs 
(ii) [$(x)de= lim [e)ax, a being any number less than  ; 


provided the limits exist. When these limits are finite, we say that 

the infinite integrals exist or are convergent. But if the limits are + co 

or —oo, we say the infinite integrals do not exist or are divergent. 
(iii) When, for an arbitrary finite point c, the infinite integrals 


EE and ow 


both converge in the above sense, then we say that the infinite 
oO 


integral f $(x)dx is convergent and is given by 
o gi [a - 
[k= feds | ¢(x)dx, c being any intermediate finite point. 
-9 -6 é 
? 


SOME USEFUL CONCEPTS AND RESULTS 27 


Next, let us consider the case where the integrand ¢(x) ter ds tay 


infinity at a finite number of points. . 
If a is the only point of infinite discontinuity, then the infinite 
b b 


integral f ¢(x)dx is said to exist or converge if lim { d(x)dx exists 
o-odo 
a+e 


a 
and is finite for arbitrary «>0. 


Similarly, if b is the only point of infinite discontinuity of ¢(x), 

b b-« 
then f ¢(x)dx is convergent if lim f ġ(x)dx exists and is finite for 
co = < 


arbitrary e>0. Ifa, b are both points of infinite discontinuity, then 
the infinite integral is said to converge if 


b e b 
f $(x)de= f $(x)dx+ i $(x)dx, for arbitrary c in (a, b), 


provided the infinite integrals on the right-hand side exist and are 
finite. If $(x) has infinite discontinuity at an intermediate point c, 
a<c<b, then the infinite integral is defined*by d 
b ene b 

f $(x)dx—= lim Í ġ(x)dx+ lim f $(x)dx, 

4 0 a e'—>0 aver 
provided the limits on the right-hand side exist for arbitrary e, é>0. 

Sometimes definite limits exist only for e=e' and then the limit 

is called the principal value of the infinite integral. 


1b.12 Indefinite integral 

The process of finding the integral of a function is called 
integration. It can be shown that integration is the inverse of diffe- 
rentiation, Thus if y=f(x) be @ function with derivative f'(x), then 
the integral of /’(x) with respect to x between a and b is f(b) —f (a). 
So, for integrating a function ¢(x), we look for an expression f(x) 
whose derivative is equal to that function. Then the function f(x) _ 


is called the indefinite integral of ¢(x) and is denoted by f $(x)dx. 


Now, since 4{ f(x) +e}=f'(x), where c is a constant, we find 


7 


28 FUNDAMENTALS OF STATISTIOS 


that the integral of f'(x) is f(x)+c. Hence f(x)+¢ also is an 


indefinite integral. The constant ¢ must not be forgotten ; it is 
because of its presence that we call the integral an indefinite one. 
Symbolically, if ’ 


MSs) +d=f'(s)=4(«), say, 


then $ gl)dx=f (x}+ c, 


where ¢(x) is called the integrand and f ¢(x)dx is read as the ‘inde- 


finite integral of ¢(x) with respect to x’. The method we have 
considered for integation is a matter of inspired guessing ; but there 
are certain rules of integration which are helpful and at the same. 
time simple. 

Some important integrals are : 


2 Oe ss i _ 2 
f: dx Pre Ba provided a#—1; 
p» sens : S 
fe dx 2 +e; 
dx. 
[Sab e+e 


ib.13 Double and multiple integral 

Let f (x, y) be a function of the variables x and J» Suppose f(x, y) 
is bounded for a set of points R of the (x,y) plane. (In particular, R 
may be a rectangle: R={(x, y)|a<x<b, c< J<d}.) Let us subdivide 
R into sub-rectangles P,, by lines parallel to the x- and )-axes. 
Let M and m be the upper and lower bounds of S(x,y) in R, while 
M, and m,, are the corresponding bounds in P,,. ` 

We form the upper (U ) and lowet (L) Riemann sums 


Us ZZM, srs 4 L=Z Em, prs 


where pr: denotes the area of the corresponding sub-rectangle P, ,. 


If the lower bound of the upper sums tends te the upper bound 
of the lower sums, which bounds exist, when the number of sub- 


"rectangles is increased indefinitely such that the area of cach sub- 


rectangle tends to zcro, then we say f(x, y) is integrable over R. 


g 


SOME USEFUL OONORPTS AND RESULTS 29 


The common value of the two bounds is the double integral of f (x, y) 
over R and is denoted by 


f f f(x; y)dxsdy. 


R 


Double integral as a repeated integral : 
db 


If f f f(x, y)dx dy, the double integral over the rectangle 
{(x, y) |a<x<b, c< y< d), exists and if 


b 
either (1) f f(x, y)dx exists for all y in (c, d) 
i 
or (2) i f(x, y)dy exists for all x in (a, b), 
: 7 b d b 
then [ [fe rdrar= f b [fle nex 


b d 
= | dx [fis rd 


Transformation of integral : i 
Integrals may often be easily evaluated by a change of variables. 
Let us evaluate the integral 


J [few X_)dx, dxa 
R . 
by a change of variables defined by F 3 x 
*= 1 (Jp J2) and X= Ja J2): 
Here we assume that the transformation is one-to-one and that 
the first partial derivatives exist. 
Let S be the image of ‘R in the (Jr J2) plane. Let J be the 
Jacobian of the transformation, defined by 
Ôg, Oss 
A(z, ge) _| Di Oe 
Jape st = A 
Dla) | Age dea 
Oy O72 


> 


30 FUNDAMENTALS OF STATISTICS 


The transformed integral is then 
J [Kev wddndn=f ffindio 
R s 


where on the right-hand side x, and A are to be replaced by 


(IJa) and galu y2). 
The above results can be easily extended to the case of triple and 
multiple integrals. 


Tate E E ICE %4) be an integrable function of the n 
independent variables x,, xp ...... Xn; then an n-ple integral is 
denoted by 

$ f Cates fer T sassy By) dey O N 
R 
where R is a region in the (x,, x;,...... s Xn) Space. 
Let us apply the one-to-one transformation : 
te =g, (ts Jp s sah ral 2 a sn, 


where it is assumed that the first partial derivatives of g, exist. 
Then the Jacobian is 
de, Oar |, O81 


Ja Olt rey te) | Oe Da 
Os sas Jn) gn dgn + dgn 
Oy, Os ayn 
and Jf Seas [few PENN s Xn)dxyss dtn 
R 
=f mA [fe A s Xa) |J |dy dyas 
s 
ewhere S is the image of R in the (J; Ja s... 3n) space and 
Xi» ža Xp ON the right are to be replaced by g,( y, EETA 
Gal Day veers In) verses s Enh Iis eres In) 


1b.14 Some special integrals 
Beta function : 
The definite integral 


i 
f x"-1(1—x)"-idx 
0 


is convergent for m>0, n>0. 


SOME USEFUL CONCEPTS AWD RESULTS 31 


We denote this, a function of m and n, by B(m, n) and call it the 


beta function or the first Eulerian integral, Thus 
1 


B(m, n)= Jea- m>0, n>0. sis (L226) 
o 
Properties of the beta function : 
(1) Ifwe write y=l—x, then 


1 
B(m, n) = | 71 (1—x)"?dx 
J 


ay 
= forty 
0 


=B(n, m). 
(2) Substituting x=sin?ð, so that dx=2 sin ĝcosĝdô, B(m, n) may 


be written in the following form : 
me 


B(m, n)=2 f sin?"-19 cos?*-16 d8. 
‘ è 
Gamma function : 

The integral 


f exp[—x]x"-}dx 


ò 
> is convergent for n>0. This integral, a function of n, is denoted 
by T(n) and is called the gamma function or the second Eulerian integral. 
Thus 


T(n)= f exp[—x]x"-1dx, n> 0. ws (1.27) 
o 
Properties of the gamma function : 
(1) If we integrate by parts, we get 
T(n) =(n—1)I(n—1). rr (RAR) 
From this it follows that if n is a positive integer, then 
T(n)=(n—1)!, 
while if n> 0 is not an integer, then 
T'(n)=(n—1)(n—2)....-.pP'(p)s 
where 0<p<1. 


32 FUNDAMENTALS OF STATISTICS 


T (m)T(n) 0. 
(2) B(m, ”)= Tima) for m>0, n> 
We have 


P(m)T in) =( expla} dx) (fei d) 


= j ji exp[— (x+ y)\e"-! 5 ie dx dy. 
00 


Applying the transformation 


x=0u ) 
y=u(l—u), J 
which has the Jacobian 
a(x, y) un 
dlu v) 


we obtain 
. ae 
T(m) (n)= u(i] —u)"-1 o"t- exp[—v]dudv 
Ji 


(since 0< x < 0, 0<y< oo ifand only if 
O0<u<1,0<v<0) 


-(f exp[—v]ont4-1 do) ( f wu)" du) 


=I (m+n)B(m, n), 
which shows that 


T(m) (n 
B(m, n)= lati sea, (1.29) 
(3) [epia PO), n> 0,a>0. 
On substituting : 


J=ax, 
we have 


[epia ferl- 
0 > 0 


=D; 


M S 


80ME USBFEL CONCEPTS AND RESULTS 33 


(4) Wehave 


x 


r= aa dx. 
o 
Hence 


P= f ferpest yt dedy, 


On making the transformation 
x=zcos*@ À 
a } (0<z< œ, 0< 0< 7]2), 

for which the Jacobian is 


a(x, y) 


PEA =2zsin @cos ð, 


we have SA 
P2 f f expl—<dbde 
oo 
mi? e 


=2| f exp[—z]az]| i do|=2 xl Xn|2=7. 


Hence 


Consider the transformation ` 
x;=C08 b}, M . 
x,=sin h; cos 03, 3 
xy=sin 6, sinĝycos bzy | i f no ten al 
X_=Sin 6, sin Op......8iN Oy. COS Op- J 


The Jacobian of the transformation is (—1)"(sin@,)"(sin6,)"~...... 
sin@,, and 0< 6;< r(i=1, 2, :....., n) is the image of the n-dimen- 
sional unit sphere 5 x; < 1. 

i 


¥a(x1)—3 


34 FUNDAMENTALS OF STATISTIOS 


7 
Using the result that f (sin 6)" d@ equals 
0 


Cans yee EA r3) d 
if (sin 6) d=B(" Tog TAA 


we have 
{ fee os f [ands AAT dX, 
Xxl N 
= z oT AE in 0, dô, 
je 6,)"d0, Í (sin 8.) "—} dO, fe 
pe OR Fg 
“ea f= Nea ; Fe" 
=a es 
re 


le STIRLING’S APPROXIMATION 


We are often faced with the problem of evaluating factorial 
expressions which are laborious to compute directly from definition. 
However, for large n there exists a useful approximation to n!, which 
is due to Stirling. -It is 


n! œ V Žmexp[—n]n" +12, oe, (2.80) 
A better approximation is given by 
n! œ V2rexp[—n+1/12n]n"*2, ... (1.30a) 


In each case, the percentage error decreases as n increases. 


9 NUMERICAL 
ANALYSIS 


Many mathematical and scientific problems are such that they 
cannot be solved by existing analytical methods or the solutions, 
even when they exist, are so complex that they will not lead to the 
desired numerical information. In such cases, the desired result may 
be obtained by purely numerical methods. Thus the numerical 
methods are concerned with the practical object of obtaining an 
approximate solution to the stated problem which is correct to a 
certain degree of accuracy. 

The numerical data used in solving problems are usually not exact. 
They are usually correct to a certain number of decimal places. 
Not only are the data approximate ; the methods and processes that 
are applied for getting a numerical solution are also approximate. 
So it is evident that our results will not be exact owing to the 
approximate nature of the data and of the methods. Errors of data 
cannot be avoided, but the errors of calculations may be made small. 

In the first section, we shall give some idea of the errors and 
approximations in numerical calculations. And in subsequent 
sections, we shall consider different numerical methods for solving 
problems of various types. 


2a INACCURACIES AND APPROXIMATIONS 


2a.1 Different types of inaccuracies : 
We differentiate between various types of inaccuracies. A blunder 
will be usually committed by a person who is not familiar with the 
method. It is a gross inaccuracy. But even when we are familiar 7 
with the method and type of computation involved, we may make 
mistakes. Some of the common mistakes are the following : i 
(i) Copying mistakes, e.g. writing 23,480 as 23,408. 
(ii) Mistakes in decimal points, ¢.g. taking 12-3/30 as 4-1. 
(iii) Incorrectly reading a table, e.g. reading from a wrong column. 
(iv) Faulty memorisation of values of constants, such as m, VŽ, etc. 
(v) Mistakes that occur with an untidy and careless worker. 


36 FUNDAMENTALS OF ‘STATISTICS 


Whenever a mistake is found, it should be carefully corrected. 
Whenever possible, we should check the calculations at all stages. 
Neat work in appropriate tabular form reduces the chances of 
mistakes. 

If we know our job and are careful, then we may avoid blunders 
and mistakes. But even then there will be a third type of inaccuracy, 
called errors, which will be sometimes impossible to avoid. Errors 
may arise from the following sources : 

(i) Errors in original data. The observations taken may have 
been rounded off and so are approximate. 

(ii) Errors due to the approximate nature of the formula. The 
mathematical formulation is mostly an idealised description of the 
problem. Sometimes we replace an infinite series by a finite number 
of terms. 

(iii) Rounding off errors. This is unavoidable in some calcula- 
tions. For some of the quantities when expressed in decimals will 
be non-terminating, and also the capacity of the computing machines 
is limited. So in such cases we are to terminate after a number of 
places. These errors may accumulate as calculations proceed. 


2a.2 Rounding off 
‘Some quantities may not terminate and some may terminate 
after a large number of decimal places. Owing to the limited capacity 
of a computing machine, we can retain only a few of the digits. 
The process of cutting off superfluous digits and retaining only the 
desired number of digits is called rounding off. To round off a 
number to n digits, we replace all digits to the right/of the nth digit 
(counting from the left) by zero, If the discarded number contri- 
butes less than half a unit in the nth place, we leave the nth digit 
unchanged ; if the discarded number contributes more than half a 
unit, we ‘increase the nth digit by unity. In case the discarded 
number is exactly halfa unit in the nth place, we leave the nth digit 
unaltered if it is even, but increase it by unity if it isodd. The 
numbers 27,598, 1:467205, 45-765, 20675 when rounded off to four 
digits become 27600, 1-467, 45:76, 2:068, respectively, 
When the above rule‘is followed, the rounding off errors will be 

mostly eliminated by cancellations. 


NUMERICAL ANALYSIS 37 


2a.3 Significant figures 

The digits 1, 2, 3, ...... } 9 are significant figures. 0 is also a 
significant figure if it is not used to fix the decimal point or to denote 
the discarded digits. The number of significant figures in a number 
expressed in the decimal form refers to the number of digits, starting 
from the left with the first non-zero digit and proceeding to the right, 
that are assumed to be correct. Thus, in the number 0-0001408 the 
number of significant figures is only four, viz. 1, 4, 8 and 0.(the 0 
between 4 and 8): The three 0’s before the first significant figure, 
viz. 1, are not significant figures, for they are needed to fix the 
decimal point. In a number like 64,200 there is nothing to tell us 
whether the two 0’s at the end are significant figures. To specify the 
number of significant. figures in such a case, it is customary to write 
the number in the form 6:42x10', 6:420x10', 6:4200x10* to 
indicate that the number of significant figures is three, four and five, 
respectively. 


2a.4 Absolute, relative and percentage errors 
If m is the approximate value of a quantity whose true value 
(not necessarily known) is m*, then the absolute error modulus of m is 
defined as 
|e] =|m—m*|. E 
It is the British practice to define the absolute error of m as 
e=m—m*, (2.2) 
while in other countries it is defined by 
e=m*—m. ign (2.3) 
Usually, the magnitude of e is more important than its sign. 
The absolute error has the dimension of the quantity, and a 
better measure of error is given by the dimensionless quantity r, 
called the relative error, defined by 


| m-m*| n (24) 


A somewhat similar measure is 


[m—m*| | $ ade (2.4a) 


|m| 
The percentage error is 100r. 


38 FUNDAMENTALS OF STATISTICS 


The absolute error of a number correct to n significant figures 
cannot be more than half a unit in the last significant figure. For 
example, if 1-467 is correct to four significant figures, then its absolute 
error is not greater than 0:0005. ; 

Let m*=f(m,*, my. , mg*) denote a function of several 
independent quantities m,*, mg*, ..-... sm”. If m* is evaluated by 
using approximate quantities my, Me, .....- » mg, Which are subject to 
absolute errors ej, ep ....-. , €p, respectively, then these errors will 
cause an error e in the function m*. By Taylor’s theorem for a 
function of several variables, we get 


m=f (Mis. Mg <s... SE) ETM ste s s ancy A mi”) + Èm maS 


(neglecting squares, products and higher powers of m;—m;*), where 
Jf are the values of the partial derivatives Of/@m; evaluated at the 
point (m,*, m*, ...... »m,*). Then the absolute error ¢ in m is given 
approximately by 


=% ufe. SA (25) 


The absolute error modulus of m is 


lx È left- ee i.6) 


This is the general result for determining the error of a function, and 
as corollaries we obtain the following : 

The absolute error modulus in a sum (difference) is less than 
or equal to the sum of the absolute error moduli of the separate 
terms. pd: 

The relative error in a product (quotient) is less than or equal to 
the sum of the relative errors in the various factors. 

The following two fundamental theorems proved in [5] give the 


relation between the relative error and the number of significant 
figures in a number : 


THEOREM 2.1 If the first significant figure of a number is k and 


the number is correct to n significant figures, then the relative error 
is less than 1/(k x 10"-), 


The converse result is given in the other theorem. 


oe © 


NUMERICAL ANALYSIS 39 


Tuxorem 2.2 If the relative error in an approximate number 
is less than 1/[(k+1)x10"~-2J, then the number is correct to n 
significant figures or is at least in error by less than a unit in the nth 
significant figure. 

Here also & is the first significant figure of the number. 

In practice, the exact value of the absolute er:or modulus is 
not known, but it is known that for a number which has been 
rounded off to n decimals, this is not greater than}x10-". The 
errors as given by the above rules are the upper bounds to the errors 
that may occur in a computation. And these will be attained if the 
different errors reinforce each other. Sometimes the signs of- errors 
will be such as to cancel each other and the actual error will be less 
than this upper bound. 

We show below by examples how to calculate the sum, difference, 
product or quotient of numbers of various accuracies. 


Example 2.1 Let us calculate the greatest value of the absolute 
error modulus of each of the expressions given below and then round 
off the results to the appropriate number of figures. Each one of 
the numbers is rounded off. 

(1) 4-12+0-768—2:71345, 
(2) 2-145 x 0-45, 
(3) 0468/43712. 

(1) In this case, the greatest value of the absolute error modulus 
is Je}<le|+|¢e|+les]- The numbers 412, 0-768 and —2:71345 
are rounded off. So |e;] <3 x 10-3, [ea] <4 x 107°, and Jes| <4 x 10-5. 
Note that the main contribution to |e] is due to |¢,|, which is 
rounded off to two decimals only. 

Now, 

Jel <px 10-744 x 1078+. x 10-5=0:5505 x 107%. 

As 4-12-40-768—2-71345=2'17455, so the true value lies in the 
range 2°17455-+-0-005505, i.e. lies between 2: 169044 and 2-180055. So 
the result may be correctly rounded off to 2-2 or it may be rounded off 
to 2:17 with a possible error of one unit in the second decimal place. 

Thus we find that in a sum (or difference) we do not get our result 
correct even up to the number of decimals of the number having 
the smallest accuracy, so it will not pay to retain too many places in 


40 FUNDAMENTALS OF STATISTICS 


the other numbers which may be more accurate. In such cases, we 
retain one more place in all the numbers than the number of places 
in the least accurate number and finally round off the result to the 
same place as that of the least accurate one and our result may be 
in error by one unit in that place. 

_ (2) The product is 0:96525, and the maximum value of the 
relative error is 

(3 x 10-9) /2-145 + (4 x 10-*) /0-45—0-00024-0-0111 —-0-0113 

while the absolute error modulus is < 0-965 x0-0113=0-0109, 

Thus the product lies between 0:9652 4+-0:0109, i.e. between 0:9543 
and 0:9761, and so the product may be correctly rounded off to 1-0 
or may be rounded off to 097 with a possible error of two-units in 
the second decimal place. Here the factor with the larger relative 
error is 0-45 and it determines the number of figures that will be 
trustworthy in the final result.” 

(3) The quotient is 010706, and the maximum value of the 
relative error is 

($x 10") /0-468-+ (4 x 10-*) /4-3712—0-001064.0-00001 —0-00107 
while the absolute error modulus will be (9-107) (0:00107) =0-00011. 

Thus the quotientlies in the range of 0:10706--0-0C01I, i.e. 010695 
to 0 10717, so it may be correctly rounded off to 0-107, 

Errors sometimes occur from the loss of significant figures by 
subtraction of almost equal numbers unless we retain in the 
beginning more significant figures. We demonstrate this with the 
help of the following example. 


Example 2.2 Find the difference W%1—4/2 correct to three 
significant figures. 
We take 
V2°1=1-44914 and oo 1-41491, 
Then ; 
V2+1—V2=1-44914— 1:41421 0.03493, 
The result is then 0-0349. Note that, to start with, we took more 
significant figures than are needed in the final result. - If we took in 
the beginning only three significant figures, the first significant figure 
of the result would be in error and we would also not get the 
required number of significant figures, 


NUMEBIOAL ANALYSIS 41 


2b INTERPOLATION 
2b.1 The problem of interpolation 
The following table gives the values of log x corresponding to 
certain values of x : 


x log x 

95 1-9777236 
96 1-9822712 
97 1-9867717 
98 19912261 
99 1-9956352 


[Note that we are following the convention of using the symbols 
log x and In x to denote, respectively, the common logarithm (i.e. 
with base 10) and the natural logarithm (i.e. with base c) of the 
positive real number x.] 

Suppose in a certain computational work it is required to obtain 
the value of log 96:45. It is not directly available from the above 
table, but can be obtained by a process known as interpolation. 

The general problem may be stated as follows : 

A function y=f(x) is known for certain values of the argument* 
X< Xa. <p, the corresponding values of f(x) being /(*,), - 
S (xg); nite ,f (xa). We are to find the value of f(x) for some other 
value of the argument lying between x, and xp. 

The function will usually be of an unknown form or, even if 
known, of a complicated nature. Hence to solve the problem of 
interpolation we replace f(x) by a simpler function, say glx), of 
known form. This ¢(x) is so chosen that 

(x1) =f (%)» P(e) =f (2a); arean s P(xn) =S (2a). 

Then, for any other value of x, say x’, the value of ¢(x’) is taken 
to be an approximation to f(x’). 

The function (x) is known as an interpolation formula. 

We shall consider only the following particular form of $(x) : 

(x) =at at + 49x? + Sarees +a,x", ie. 
where n is a positive integer and a,#0. ¢(x), as defined here, is 
known as a polynomial or a rational integral function in x of degree n. 


*In problems of interpolation, the function is generally referred to as the entry 
and the independent variable as the argument. 


42 FUNDAMENTALS OF STATISTICS 


The justification for the replacement of f (x) by a polynomial ¢(x) 
lies in an important theorem due to Weierstrass, which says that if 
J (x) is continuous between x, and x,, then it can be replaced by a 
polynomial of suitable degree in that interval so that the difference 
between f(x) and 4(x) will be as small as desired. 
2b.2 Finite differences 

When the values of the argument are equidistant, the process of 


interpolation is facilitated by the use of what are called finite 
differences, 


Let k be some positive constant which we shall call the difference 
interval. Then, by definition, 
Sf (x) =f (x+-h4)—f (x) ati (2.7) 
is the first diference of f (x). This is also sometimes referred to as the 
first descending (or forward) difference of f(x). Here 4 represents an 
operation and is not a quantity. i 
Similarly, ‘ 
A? f(x) =A{Af (x)}=Af (x +h) —Af (x) 
=f (s+ 2h) —2f (x4+-h) +f (2) 
is the second difference of f(x). The higher order differences—the third > 
fourth, etc,—are defined in a similar manner : 
A’ f (x) ={444......1 times} f(x). ss (2.8) 
Tt must be always understood that A’ is not the rth power of a 
quantity, but simply the repetition of the operation represented by 4 
r times, where r is a positive integer. 
The differences of various orders of the function J(x) may be 
systematically obtained by constructing a diagonal difference table 
as follows : - 


Argument Entry lst difference 2nd difference 3rd difference 


OA E E S aE SOT TARS 
a f(a) 
4f (a) 
ath f(e+h) 4* f(a) 
4f(a+h) 4° f(a) 
a+2h f(a+-2h) A f(a+h) 
Af (a+-2h) 


a+3h f(a+3h) 


In the above table, f(a) is called the leading term and Af (a), 
A? f(a), A3 f(a), ..,... the leading differences. 


NUMERICAL ANALYSIS 43 


It may be noted that to obtain 4" f(a) it is essential to know the 
values f(a), f(a+h), s.. »S(a+rh). It is quite easy to form a 
difference table for a set of data, which is illustrated in the following 
example. 


Example 2.3 We draw up the difference table for the data of the 
first two columns, 


a fa afl) o fe) a 


2 17 
—15 
4 2 14 
—l1 —12 
6 l 2 24 
1 12 
8 2 14 24 
15 36 
10 17 50 
65 
12 82 


Let us consider a polynomial of degree n : 
(x) =at ax + age? +.---+ +a,x". 


A$ (x) =4(x+h)—$(2) 
=ayh-Hagl(x-+h)2?—x} alH) a} tee 
+a,{(x+h)*—2"}, aino aid) 
which is a polynomial of degree (n—1). 

The second difference 4?$ (x), being the first difference of a poly- 
nomial of degree (n—1), is a polynomial of degree (n—2). Thus by 
taking difference once, the degree of the polynomial is reduced by 
unity. Hence 4"J(x) will be a polynomial of degree zero, i.e. a 
constant. 4**+44(x) and higher order differences are all zeros. 

This result, together with its converse, which also can be shown to be true, 
will be useful in solving interpolation problems. 


Then 


2b.3 Error in a tabular value 

Let yo Jı «s+ Jao be the correct values of the entry y= f(x) 
corresponding to the equidistant values žo, ži, +++ , x Of the 
` argument x. Suppose further that the central value y, is in error 
and the value recorded for it is y,+¢. We construct the diagonal 


44 FUNDAMENTALS OF STATISTIOS 


difference ‘able and exhibit the manner in which this error in p, is 
propagated through the various differences. 


x y=f(x) 4y Aty 4y Aty A’y 

Xo Jo 

Av 
*y Jı A A ” 

Dr Jo 

a is ay, Ayo i e 

Ay, Ay, Arte 
%, Js 4’) Aty +e 

ys Ayat e Ay — 5 
OM Aye A'y,—46 

jite A*y,—3e Ay, + 10e 
Xs ste A*y,—2e Aty, + 8e f 

4y,—« A*y, +3 A>y,—106¢ 
Xe Ie Ay +e Aij — 4e 

Ay, A’ys—e Ab yy 4-5 
th AY, Ainte 

5 

Ay, AY Ays —e 
ža Js A) Ary, 

Ays A?) 
o Jo A*y, 

Ay, 
Xio Jio 


We observe the following facts from the above table : 

More and more differences are affected by the error e as we take 
differences of higher and higher orders. The maximum absolute 
error is along the line through y, or nearest this line on either side. 
The coefficients of e in a column are alternately positive and 
negative, being the terms of the binomial expansion of (1—1)". The 
algebraic sum of the errors in each difference column is zero, If 
the function y=f(x) be a polynomial of degree 4, then the 45y values 
will he solely due to error and they will be symmetrical around 
the line through yẹ. Also, the sum of the 45y values will be zero. 

So we work out the differences till we come to a stage where the 
values are alternating in signs and symmetrical about a value, and 
their algebraic sum is zero. Then the line through the maximum 
absolute error or midway between the two largest absolute errors 
will locate the error in the entry. It is corrected by obtaining e from 
the last difference column, which is solely due to error, and sub- 
tracting this from y,+e, since ( _y,4+¢«)—e=y,, the true value is found. 


NUMBRIOAL ANALYSIS 45 


We have considered above the simplest case, and it will be clear 
from the above difference table that if the error is not in the central 
part of the table and in each difference column we do not get all the 
differences affected by e, then the observations we have made in the 
preceding paragraph will not be realised. There will be further 
complications if more than one value is in error. Bearing these in 
mind, we illustrate the method with a simple example. 

Example 2.4 Let-us locate and correct an error in the values of 
a function, using the following difference table : 


x y Ayes dy Ay Ay 
=1 —25 
Ji x6 
l l —24 
2 48 
3 3 24 6 
26 54 
5 29 78 —24 
104 30 
7 133 108 36 
212 66 
9 345 174 —24 
386 42 
ll 731 216 6 
602 48 
13-1333 264 
866 REE 
15 2199 l l 


We find that the d‘y values are symmetrical about the line 
through f(7)=133, are alternating in sign and the algebraic sum of 
the 4ty values is zero. Hence we conclude that ‘the 44y values are 
solely due to error in f(7). By referring to the earlier table, we 
find that the values in 44y column are €, —4e, +6¢, —4e and’, 
respectively, Hence estimating e from any one of these, say from 
6e=36, we get e=6, and the corrected value of f (7) is 133—6=127. 


2b.4 Use of the operators 4, E 
We have already introduced the operator 4 in 


We now define the second operator E as 
Ef (xy=f (thi) where. h is the difference interval ; 


Ef (x) =E{Ef(*)}=f(*+2h) 5 


Section 2b.2. 


46 FUNDAMENTALS OF STATISTICS 


and so on ; generally, for a positive integer n, 
E"f (x) ={ZE......n times} f (x) =f (x+nh). 

More generally, we shall denote f(x+uh) by E“f(x), where u 
may be positive, negative or fractional. 

Since 

Af(x) =f(#+h)—f(*) 
=Ef(x)—f(2)=(E—1) f(*), 

one may take 4=£Z—1 or E=1+4. The relation, it should be 
noted, is one of equivalance and not of equality. 

From these, we have the general equivalence relations : 

"==(1+A)" and 4"=(E—1)*. eben tee 0) 

Further, it can be proved that for any f(x), and any positive 

integer n, 


S(xtnh)=E"f (x) =(1+4)"f(*), 


=f + (PAO +(5) af) +--+ (ava) 
sa (2e11a) 
and A*/(x)=(E—1)*f(4) 
=Ste+nh)— (1) S++ (3) FEAREN... 


+(=1)"(") fle). e (211b) 


The above binomial expansions are also valid for any negative or 
fractional n, if f (x) is a polynomial in x. 

Although 4 and E are not numbers, they satisfy the ordinary laws 
of algebra—the law of commutation, law of distribution and law of 
indices. Many problems may be solved by using these operators. 

Example 2.5 The values ofa function f(x) are given below for 
equidistant values of x : 


2-70 
The value of f (7) is unknown. Obtain as best an approximation 
as possible for f (7). 


ox oa |a 
N 
© 
a 


NUMERIOAL ANALYSIS > 47 


Since only four values of f (x) are given, we shall assume that f (x) 
is a polynomial of third degree ; as a consequence, 4‘ f(x) is to be 
regarded as zero, Now, 


A‘ f(4)=0 
means (E—1)4f(4)=0, 
or (E‘—4E*4-6E*—4E+-1) f(4)=0, 
or -o SOAT) (4)=0. 


Hence 


f0) —£(8)+6f(6) oO +A) 9.77, 
This is a problem of finding a missing term. See also Exercise 2.22. 
Example 2.6 The table below gives the average number of years 
of life (e?) remaining to persons who survive to exact age x, for the 
male African population of Belgian Congo : 


a Fe ihe, 

0 37-64 

5 44-04 

10 41-40 i 
15 37-78 
20 34-41 


Obtain ¢ for x=1, 2, 3 and 4. 

These can be obtained by applying Newton’s forward interpola- 
tion formula. But when from the given table with equidistant 
values of the argument we want to form a new table with finer 
intervals, the new values of the argument also being equidistant, we 
should adopt the following simpler procedure. 

Let 4e9, 4%9, «1... denote the differences for the given table with 
interval 5 and Se} 6%e9, .....- , etc., denote the differences for the new 
table to be formed with interval 1, Then using the properties of 
4 and E, we have for the given table 

ef, ,=(144)e2 
and for the new table to be formed 
(l +8)*e9, 


30 that 
(1+4)e9 =(1+8)%?. 


EEM N i i T A S T Died 


48 FUNDAMENTALS OF STATISTICS 
That is, 
1+4=(1+-5), 
or §=(1+4)?5—1, 
or 6=:24—:084?+--0484°—-03364!+...... A 
Again, 
8%ex(-24—-0842+-048.49—......)2, 
or =: 0442—0324? + -025644—...... à 
Similarly, 
5%=+0084*—-009644+......., 
and 8'=-001644—......, 
and so on. 


The leading differences for the given table are 
Ae’ =6°40, 
` Aeg = —9:04, 
4%e8 =8-06 
and Ate? =— 6:83. 


Here the fourth order differences 44e? are assumed to be constant, — 
which implies that 549 are also constant... As such, we take 


Sef = (-24—-0842 4.04842 -03364%)¢9 
Lae omt eg —:08.4%¢8 +--048.4%¢° — -033644e9 
=2-619568, 
82eg = (-0442—-0324 + -025644)¢? 
ime-O4A¥e? -032439 + 0256448 
= 10794368, 
88 = (-008.4*—-0096,A*) 8 
= 00843 009648 bal 
=0-130048, 
and Ste =-00164%2 = —0-010928. 


NUMERICAL ANALYSIS 49 


Taking these as leading differences for the new difference table 
with interval of differencing 1, the table can be completed by using 
the relations : 

Ste) = 5528 + 54e8, Srel =58e9 + 54e2, 
5%e9 = 578 +880, 579 = 5%e9 + d%e9, 


etc. This is shown below : 


x Sue 3e are Beek si 

0 37°64 
2°619568 

1 40259568 —0:794368 ai 
1-825200 ~ 0130048 

2 42084768 —0°664320 —0°010928 
1-160880 ` 0-119120 

3 43°245648 —0°545200 j — 0010928 
0-615680 0108192 _ 

4 43:861328 — 0-437008 
0:178672 i 

5 44:04 


—— L i L 
From the above table, the required values òf e? are found to be 
29 = 40-26, e§ = 42-08, e§ =43-25 and e} =43-86. 
The process illustrated in this example is known as subtabulation. 
Example 2.7 Prove the identity 
S(a)-+f (a+ h)s[F-f (a4+2h)s9]214-f (a+ Sh) x2/314 +s: 
=exp[x][ f(a) +«4/ (a) 4x2? f(a) /2!+x8A? f (a) [3l] 
This will be proved by using the equivalence relations E=1+4 
and £"=(1-+-4)". Also, the function f(a) will be separated from the 
operators and will be re-introduced at the last stage. This is known 
as the method of separation of symbols. 
In the present problem, 


l.h.s.=f(a)+Ef(a)x/1 1p B® f (a)x*/2!-+- E f (ax? 31 oss 
—exp[Es} f(a) =exp[x + 4x] f (à) 
=exp[]exp[4r] f(a) 
exp[r]{1 £ Ax]! 4 48x214 458/314 2.1 S (a) 
—explx][ /(a)-+ Af (a) +242 f (a)/2! +4348 f(@)/B1-+ 5-00] 
=r.hs. ` 


For some similar problems, see Exercise 2.7. 


¥a(1)—¢ 


50 FUNDAMENTALS OF STATISTICS 


2b.5 Newton's forward interpolation formula 

Let yg, pis iv > Jn be the values of the function y=f(x) correspon- 
ding to the (n+1) equidistant values of the argument xoy x;, +. =+., Xe 
the common difference being h. With these (n+1) values, we can 
take as our interpolation formula a polynomial of degree n. A poly- 
nomial of a higher degree wil! contain more than (n+1) constants, 
which cannot be determined when only (n+1) values of y are given. 

‘Let us take the nth degree polynomial in the form 


$(x) =a +a; (x — xy) +-4)(x—%9)(x— 24) + eos 


+4, (x—x9)(x—2x).--+- (x—X_-3)- -+ (2:10) 
The constants ap âp =- > @, are determined by solving the 
ations 
hei I=$ (z) =$) «+--+ 1 In =$(xn). 
Now, (x) =a so that 
%=Jo- 
Again $(*1) =a +a (x1 — x0). 
Equating this to y,, we have 
Jota h=yy, 
Also, 


- $lx) = 09+ 4; (%x2—%p) + a:(ža— xo) (X2—*1)- 
Equating this to yp, we get 
rte Lh ay 2h? = yo, 


which leads to 
, ies te An Jo Tah 
Poa SINE 2th? Qh’ 
and so on. | 
Lastly, equating ¢(x,) to_y,, we find 
a =A" Io, 
AEA 
Substituting these values for ap, a,, .-..:- , @, in (2.10), one gets 
. a i 
piat (AQ De ro) (aa DOE one 


EEVEE P (4a) 20. ve (2.11) 


NUMBRIOAL ANALYSIS ‘ 51 


This is Newton's forward interpolation formula. This may be put in 
a simpler form by substituting 
u=(x—x)/h. 
The formula then reduces to 


(x) =$(xp+ uh) =+ dy, + MO ") 4% y+ ak. 


puiu Diu =! AN (u ntl) 4s, we (212) 

Since the formula contains values of the tabulated function 
beginning from yə forward (to the right) and none backward it is 
called a forward interpolation formula. It is used mainly for inter- 
polation near the beginning of a set of tabulated values. 

It is obvious that if y=f(x) be a polynomial of degree r(<n), 
either exactly or approximately, then one need take in the forward 
formula only the first (r-+-1) terms of (2.12). 


2b.6 Newton’s backward interpolation formula 
Newton’s forward formula is not used for interpolation near the 
end of a set of tabulated values. For this purpose we have another 
formula, which is also due to Newton. As before, let there be (n+ 1) 
pairs of values of the argument x and the entry yo (x): 
(os Jo)» (žo Ji) veer » (Xa Ja)» 
the x’s being equidistant with common difference h. In this case, the 
approximating polynomial ¢(x) of degree n is taken as 
(x) = by by (2 —¥n) +ba(*— n) (*—2n-1) + hetee 
bby (x#—Xq) (4A a4) 22 (x—x). «-- (2.13) 
The constants bo by -e ,6, are determined by solving the 


pyar Ja =$(%n)s Jna =$ (tn) peca , o= $ (20). 


$(x2) =o 


Now, 


so that byp=ya- 
Again, 
plen) = bots (%n-1—*n)- 
Equating this to y,-1, we have 
Fa—bh=Ja-v» 


Be se’ n-i Ant, 
h A 


giving h= 


52 FUNDAMENTALS OF STATISTIOS 


Again, 
$(%n—2) =bot bi(ža-a— Xn) +9(%n—2—¥ 2) (%p—-2— 4-1) 
Equating this to y,_,, we get 


> -Bazi hth sy 


which leads to 
by =I- In + 2Ayn 1 _ In- Yna tn Ania 
l- «Oth Pi 214? 
and so on, 
Lastly, equating ¢(x,) to yp, we find 
Ayi 
b= mre 
` Substituting these values for Des Basi esasa » bn in (2.13), one gets 
A? y,_ 
$(x)=yn+ (s= xn) Pasty (x—=x,_) (xani) rst et 
= APETA V PA ad Giai r a 1) 


This is Newton’s backward interpolation formula. 
If we put u=(x—x,)/h, the formula reduces to a simpler form : 


(x) =$(%n +uh) 
=), +udy, Fe ay, fp recees 


fais a cay eed A hy 


n! 
This contains values of the function beginning from y, all back- 
ward (to the left) and none forward. Hence the name ‘backward 
interpolation formula’, 


Example 2.8 From the following table, determine ¢9-125 and ¢l-18% 


K E 
0-12 1-127497 
0-13 1:13: 828 
0-14 ` 1-150274 
0-15 1161834 
0-16 1-173511 
0:17 1-185305 
0-18 1-197217 


0-19 1:209250 


NUMEBIOAL ANALYSIS 53 


First of all, we construct the diagonal difference table : 


eg e* de" Ate* 

0-12 1-127497 
0-011331 

0-13 1-138828 0000115 
0011446 

0:14 1-150274 0-000114 
0:011560 

0-15 1-161834 0-000117 
0-011677 

0-16 1-173511 0-000117 
0-011794 

0:17 1-185305 0:000118 

i 0-011912 
0-18 1-197217 0-000121 
; 0-012033 j 
0-19 1-209250 


In the present case, the second differences are approximately 
„Constant, which indicates that the appropriate interpolation formula 
may be taken to be a polynomial of degree 2 (vide the concluding 
lines of Section 2b.2). 

To determine e125 we have to interpolate at the beginning of 
the table, in which the values of the argument are equidistant. 
Hence we should use in this case the forward formula. 


Here 
h=0-01, x» =0°12 and x=0:1245, 
so that p ‘ 
0:1245—0-1 
meee oem EG 
0-01 fyi 


Using formula (2.12), we have, therefore, as an approximate 
value of ¢9-1245, 
$(0-1245) yp bud, + Me! 


=1-127497 +-00509895 — «00001423 
=1:13258172, i.e. 1°132582 (correct to 6 decimal places). 
For determining e185, we are to interpolate near the end of the 
table, for which we use the backward formula. In this case, 
h=0-01, x,=0°19 and x=0-1895, 


a" 0-1895—0-19 
= 0-0 =—0-05. 


54 FUNDAMENTALS OF STATISTICS 
Using formula (2.15), we have, as an approximate value Of 29-1895, 
¢(0:1895) =», $udyy ttt Naty, , 


=1-209250—-00060165— -00000287 
= 1-20864548, ie. 1+208645 
(correct to 6 decimal places). 
Example29 Determine ¢75 and ¢%1%1 from the table of 


Example 2.8. 

To determine ¢%1!75 and e911 we are required to find values of 
the tabulated function ¢* corresponding to values of x outside the 
range of the given values (from 0:12 to 0-19). This is not a problem 
in interpolation, but one in exirapolation. But here, too, one may use 
either Newton’s forward formula or his backward formula, depending 
on whether the required value is for x less than xp, the smallest 
tabulated value of the argument, or for x greater than «,, the largest 
value. In the case of extrapolation, we assume that the function 
y=f(x) is smooth near the ends of the range and that we are not 
extrapolating beyond a distance h on either side. 

To extrapolate for x=0:1175, we take x,=0:12, x=0:1175 and 

; 0-:1175—0-12 
h=0-01, so that a —0:25. 


Using formula (2-12), we have, a an approximate value of ¢%:1175, 
$(0°1175) =y9-+udyy HE Day 
=1:127497—0:25 x0:011331 + (<0? 19) 0-0001 15 


=1+127497 —0-00283275+-0-00001 797 
=1-12468222, ic. 1-124682 
(correct to 6 decimal places). 
For determining ¢°}°", we take x=0:1911, x,=0:19 and h=0'01, 


so that 0 ot 1, and using formula (2:15), an approxi- 
mate value of ¿9191 is obtained as 


$(0°1911) =y, tudy, Muth ge id 


=1-209250-+40-11 x0-0120334 LLX I1 x 0-000121 


NUMERIOAL ANALYSIS 55 


=1-209250+4-0-00132353—0-00000739 


=1-21058102, ie, 1210581 
(correct to 6 decimal places). 


2b.7 Lagrange’s interpolation formula 

The above formule can be used only when the given values of 
the argument are equidistant. But sometimes it may be difficult to 
obtain tabulated values of a function for equidistant values of the 
argument. To deal with such cases, we require a third formula 
which can be used even when the values of x are not equidistant. 

Suppose we have (n+1) pairs of values of x and y=f(x), viz. 
(9s o) (žo Ja)» Ao Fe) ee , (žm Ja). Here again we take- as 
our interpolation formula a polynomial of degree n. Let us take the 
polynomial in the form 


(x) =c( x3) (xX =t) (x—*a) 
+-6,(x—%9) (x —%q) +++ (x—*) 
ner i 
+e, (x— xo) (*— %1) (x—xr a) (žr) (x—*a) 
ter (x—X%) (X—%1) 0 (x—*n-1)- .. (2.16) 
The constants Co, Cy +++++* „c, are determined, as in the previous 
cases, from th= equations 
Wo=H(%0)s =P Ar) oe »Jn=$(%n)- 
Now, (xo) =¢o(o—¥1) (%04a) (xp—xn)> Which on being equa- 


ted to yọ gives 


= ———— a 
o= (eo — #1) (Fote) (0n) r 
x,—x,). When it is equated 


Again, p(x) =¢1(%1— o) (41 —*2) sy ( 
to yı, we get 
= PEET ms es ee A A 
1 a ro a ka) (an) 


Similarly for the other constants, 


56 FUNDAMENTALS OF STATISTICS 


Substituting these values for cg, cy, «..... s Ca in (2.16), we get 
a (rx) (2) 100 (x—x,) 
$) (%)—*1) (X9 — 3e). (okn) 
(x— xo) (xx)... (x—x,) 
Wieden S N R 
HOETA) (Sika): (gata)! 


(x—xq)(x—%,)s00+- (x—x,-1) 
f Cai) OET a rea) hrga Rl?) 


This is Lagrange’s interpolation formula, which is sometimes given in 
a different form that helps to minimize computational labour : 
“ glx) 
(x#=x9)(x—2})--00 (e—x,) (x—x9) (t — 21) CETA] RA o—Xn 
Jı 


(2.18) 


If y is a function of x, in many cases it is also possible to look 


upon x as a function of y, x=g( y), say. In such cases, by writing 
Lagrange’s formula in the form 


Soe O=) 

O ° 

prih o tre (3-a) 

Hn r rie “Oman 
a 

PE EOE reves Veri 


*G, =y) (Ia —J;) Aeees (Funai Ina) $ 29) 
one may use it to determine the value of x for a given value of y, e.g. 
to determine the value of x for which log x= 3°743 (vide table in 
Example 2.12). This process is referred to as inverse interpolation. 


NUMERICAL ANALYSIS 57 


It is seen that Lagrange’s formula is of 2 more general nature 
than Newton’s formule considered earlier. It is applicable in any 
part of the table, and for this the values of the argument need not be 
equidistant. The use of this formula, again, does not require the 
construction of a difference table. But, on the whole, it is of a 
more cumbrous form and its use will be comparatively laborious. 
Naturally, where the application of Newton’s forward or backward 
formula is justified, this should be done in view of the resulting 
saving in labour, which in some cases may be considerable, 

Example 2.10 Let us express Lagrange’s formula in the following 
form : 

$ F(x 
s= 2 Gone?” 


7=0 
where F(x) = Ii(x—*-) and F’(x,) is the value of F’(x) at x=x,. 
r=0 


F(x) being the product of (n-+1) factors, the derivative of F(x) 
will be the sum of (n+1) terms: 


F(x) (ax) (2ta) (Atn) + (1—0) (4a) 0 (XR) 
phere Hirm zo) (#— pag) (XA ga) re (x—x,) 
+ (xxo) (¥—¥y) eee (x—%p-1)- 
Hence 
F (xo) = (20—11) (20— x2): (40 Fn)» 


Thus it is now easy to verify that 
F(x) TONE aiid 2a 
Gont ee) CaF e) (xxn) F lEn) 
are the same as the coefficients of yo Jn =+ 3 Jn, Occurring on the 
right-hand side of (2.17). This establishes that Lagrange’s formula 
is expressible in the alternative form 


SFe) 
$)= E GEE 


58 FUNDAMENTALS OF STATISTICS 


Example 2.11 Let us next show that the sum of the coefficients of 
Wh i nin Lagrange’s formula is unity. In other words, we 
are to show that 

U F(x) pe 
Peere aem 

We deduce this result by decomposing | /F(x) into partial fractions. 

We assume 


boio ys ldi. L (2.20 
F(x) rao ree SF ry ( ) 
where Ag, Ay, sss: , 4, are independent of x, because to each linear 


and non-repeated factor (x—x,) of F(x) ki corresponds a partial 


fraction zA , Where A, is a constant, and 3 HD can be expressed as 
8—x, 


a sum of such fractions. 


Since F(x)=(#—xg)(¥—x,)..-..(x—#_), On simplification (2 20) 
takes the form 
V=3Ag(x-—x;)-00-<(x— Xn) HA o) (Xap) 0-2 (XA) Pees 
+ Ag(X— Xp) <0 (x—%n-1)+ 
In the above identity we put x=x, x1, «+--+ , Xa in succession and 
, thus derive A 
Ay= RoR RSH l 
= a a a Fey | 
A 1 1 | 
1= — Z 
. (x1 =o) (%1—*2) vena (x; —*y) F(x) 
A = l : — a l . J 
(a —¥o) (Xp — Bi) (Rp —Xany) F(a) 
Thus determining the constants Ag, Ay, =- , An we now have, 


from ar. 


om e AA mare Sad AAC 
Multiplying both sides by F(x), we get the desired result : 


= F(x 
2, won A 


NUMERICAL ANALYSIS 59 


Example 2.12 x log x 
“3531 37428037 
5532 3-7428822 
5533 37429607 
5534 3:7430392 
5535 3:7431176 


nn 
Jo fe J Js 


y= 3743 ye 


0 


0°0001963 
00001178 —0'0000785 0 


Jı 

ys 0:0000393 +—00001570 —0-0000785 0 

Is —0-0000392 —0°0002355 —0-0001570 —0°0000785 0 

BZ) | — 00001176 —0-0003139 — 00002354 —0:0001569 —0 0000784 0 


each element is the value of y in the corresponding 
ponding row. The elements 


wn, because each of them is 
) to an element below this 


In this table, 
column minus the value of yin the corres 
above the principal diagonal are not sho 
equal in magnitude (but opposite in sign 
diagonal ; e.g., (JsJ) =— (J13) =-+0-0000785. 

Substituting these differences in (2.19), we have 
+ 1178 x 393 x 392 x 1176 
a) x x x 

=0-0234250 ; 

_ 1963 x 393 x 392 x 1176 
a) x 785 x 1570 x 
=—0°156157 ; 
=95) (9— 41963 x 1178 x 392 1176 
x xi 


J ( =a 1) 
Jo) (Ja Ix) a Ia) (Pa Fa) x 
=0-702259 ; 


60 FUNDAMENTALS OF STATISTICS 


(=y) (9-91) (9-73) yya) = 1963 x 1178 393 x 1176 

(Js=) (33—91) (Js= (Js—Ja) 2355x1570 X 785 x 784 

I3 y=) (y=) __ 1963x1178 x 393 x 392 

(Ia ~I) (191) (Ya— Ya) (Ias) 3139 x 2354 x 1569 x 784 
= —0-0391929, 


Hence an approximate value of antilog 3-743 is 
x=4(3°743) 
= 0234250 x 5531—-156157 x 5532-+--702259 x 5533 
+*469666 x 5534—-0391929 x 5535 
=5533'501. 
Adjusting the decimal point, we get antilog 1-743 =0:5533501. 


2b.8 Divided differences 

When the values of the argument are not equidistant, we cannot 
obtain the finite differences and so cannot use interpolation formula 
based.on those differences. We have discussed Lagrange’s formula 
in Section 2b.7, which is useful in such cases. 

When the values of the argument are not equidistant, we may, 
alternatively, form what are called divided differences and can use 
Newton’s divided difference formula for interpolation 


Let the values of f(x) be known for CT EARE E PRA +%,- Then, by 
definition, 
S (is %) = Sl) So) we (2.21) 
X1—Xy 


is the divided difference of the Jest order obtained for the values of the 
argument xo, x,. Similarly, 
Fen yy %) = Lv 1) =f xo) es (2.22) 
Seto 

is the divided difference of the second order obtained for the values of the 
argument xg, x;, xa- The higher order divided differences are defined 
in a similar way. 

The divided differences of various orders can be obtained 
systematically by constructing a divided difference table. 

The divided difference of a particular order can be expressed as 
a symmetric function of the values of the argument involved in it. 


NUMERIOAL ANALYSIS 61 


For example, 


Fs #4) = LEDS (ee) Ae flea) 


*1—%o ang” xo, 
af (%o) —S(*1) _ 
Med LO) f(s n). 
And, in general, 
F(%) 
Ft Sy ee eee 

a IN Sa RIN aco 
(%1—%q) (41 — xg) >e (4, — an) 
A e ENE a Sva) 
(Xa — x0) (Xa ti) (a —Xa-1) $ 


A result similar to one for the case of finite differences is the 
following : 

The divided difference of order n of a polynomial of degree n is a 
constant. This follows from the simple results on divided differences 
given below, which are obtained from definition : 

(a) Divided difference of (a(x) LA(x)]=divided difference of 

g(x)-+ divided difference of h(x). 

(b) Divided difference of cx f(x)=c x divided difference of f (x). 

(c) Divided difference of order n of x" is a constant. 

We may derive Lagrange’s formula from the definition of divided 
difference as follows : 

Let f(x) be a polynomial of degree n and suppose f(x) is known 


for x=x, Xj s Xa. Then, from the definition of divided 
difference, 
E S(x) S (xo) 
J a 4 bce rs LFS ar RM mE (PT Eases (xo—*n) 
(xn) 
a IEEE eee) veses (*n—*n-1) 
But f(a. teier »X,)=0, being a divided difference of order 


(n+1) of a polynomial of degree n. We have, therefore, 
S (x) = E41) reel), Sis) + Raa taal) f(s) 


(Xok) (*o— xn) Xi — Xo) (X1 — Xg)... (x1 =x) 
MERA E ea flee)» 
(Xa —Xo)s ee (Xn — Xn) 


and this is the Lagrange formula. 


62 FUNDAMENTALS OF STATISTIOS 


When the values of the argument are equidistant, we may 
compute both finite and divided differences. We give below the 
relations between these two kinds of differences when the values of 
the argument are equidistant : 


fle 141) LEW afl) AS) , 
h 
Ferat x 2nyaL(eth ene er Si x+h) 


fth) — Af) 
24? 


-2f(x). 
: 2h ? 
and, in general, 
f(s, x+h- ' hab) a2 Ll), au? (2.24) 


ni h” 


2b.9 Newton’s divided difference formula 
Let f(x) be a function whose divided difference of order n isa 


constant. Suppose f (xo), f (x1) =+- , f (*,) are known and x is any 
other value of the argument for which we want the value of f (x). 
Now, 
Ji tynn » Kung) =S (Xe Xir eio Xn) 


since, by assumption, the nth order divided difference is a constant. 
So = 


S (Kos X19 2208+ 7%.) = os 
es aE 
Or f(X, Xo ++ 0-29 Xn ag) =f (2o Xis eee scan 
+ (x— naa) S (Fos My eee pity) 


Again, writing 


we have 
S (xs Kos 00 9X nog) =f (Koy p «= san- + (3—3 nue) f (Xs X45 e ROPE) | 
+ (*—Xn-2)(*¥—% naa) f (Xo X e TA 
Proceeding similarly, we get finally 
S (x) =f (%0) + (x — Ho) J (to %1) + (*— xo) (*— x1) f (xos X19 ža) Heee 
+ (x— xó) (xt) (4% 023) J (žo žo e s Ra). 
ee (2.25) 


This is known as Newton’s divided difference formula. 


NUMERICAL ANALYS.S 63 


2b.10 Central difference formule 

In this section, we shall consider interpolation formule that 
employ differences lying close to a horizonta! line drawn through the 
central part of a diagonal difference table. The advantage of these 
formulx over the formule considered in previous sections is that 
the former converge more rapidly than the latter when the value 
required is near the central part ofa table. As a result, these central 
difference formule are useful for interpolating near the middle of a 
series of values of the argument. The two most important central 
difference formula are Stirling’s and Bessel’s formula, We next 
obtain the Newton-Gauss forward and backward formule, and they 
in turn will give us Stirling’s and Bessel’s formulz. 


2b.10.1 Newton-Gauss forward formula (with 2n+1 equidistant 
values of the argument) 
Suppose f(x) is known for 


x= xg nhy iocs , Xg— 2h, Xo—h, Xos žoth, xo tH2hy oes sžg+nh. 
Let us take 
Xo =Xg, Xy=Xy Hh; X= xo — h, %3=Xy+-2h, xg—Xy—2A, ...... 4 


Xın-1=%0 +nh and xen =%)—ah 
in Newton’s divided difference formula. We have then 
S (=) =f (xo) + (*—xo) f(*o xo +h) 
+(x—xo)(*—xo— h) f( žo žo th xo—h) ; 
+ (*— xo) (*— xo —h) (*—xo +h) flo Koh, Xo—Ay %y+2h) + 
ir + (x—xo) (x —xo—h)(*—xo +h)...-.. (*—xo—nh) X 
S (x0) taths žo—h, +++. s Xo +nh, xo— nh). 
Replacing the divided differences by finite differences, we get 


SSe) ETAs t EF ED y Sah 
Ey RTD EENEN PS, 


— xo) y (*—7o—h) (x—*xo—nh) _ 42*f(xg—nh 
ET TATA Xoe X ETT x st 


64 FUNDAMENTALS OF STATISTIOS 
Let us put u=(x—x,)/h. Then 
F(a) =flee uh) fle) + uA tee) + (8) A2¥ Cea —A) + E S 
ii AT Fra —nh). .. (2.26) 
And this is the Newton-Gauss forward formula. 


2b.10.2 Newton-Guass backward formula (with 2n-+1 equidistant 
values of the argument) 
As for the previous formula, we assume that fi (x) is known for 


x=% nh, «0... s Xo— h, Xos toth eee s šotnih. 
But now we take 
Xo=Xor X= — h, X= ot h Zo=xo— 2h, Xg=%q+2h, PA: 
Xen-1=X0— Nh and xen =x0-F nh 
in Newton’s divided difference formula. We have 
- F=f) + (*— xo) f (o *o—h) 
+(x — xo) (*a— xo +h) f( žo Xo — As 2o + h) 
+(*— xo) (x0 +A) (x—xo— h) f (xo *o—hħ xo +h, xo— 2h) + 
hodisa -+(x—xo) (x= zot h)(*—xo— h) (x Hoth) x 
SA (xq, orh Koby ne s Xo— nh, Xp-+-nh). 
Replacing the divided differences by finite differences, we get 


nofe + E22) Af (xa E (emma A) AS 


| eas an 


h 
(x—x) (xxo th) V (¥—*o—A xE“ +nh) 
MET 23X h x r S EU —— 
yA feamh), 
(2n)! 


Let us put u=(x—xo)/h. Then 
fla) —fle+ uh) flee) + wAfley—W) + ("3 C 
+ Me pe Sloh) (41) dt flxq—nh). + (2.27) 
This is the Newton-Gauss backward formula, 


el Nae ale te i) i ee ee ae 


NUMERICAL ANALYSIS 65 


2b.10.3 Stirling's formula (with 2n+-1 equidistant values of the 
argument) : 

Let f(x) be known for X=Xo; Yth, xp bh, «00... »Xo-tnh, and 

suppose f(x) is to be determined for x,—h < x < Xoth. Stirling’s for- 

mula, which is meant for problems of this type, may be obtained 


by averaging the Newton-Gauss forward and backard formule as 
follows : 


| a u+ 
I(x) =f (xo-+uh) =f (%0) +u 4 gehts OA ew 
+(t\4 GaN AP flso— 2h) 


u+n—1 u-}-n 
( 2n W 2n ) 
Hi tT af ny). 
It contains alternately mean coefficients and mean’ differences of , 
the formule from which it is obtained by averaging. 
On simplification, this reduces to 


SO) =f (rouh) =f(%o)+u Geo rates HAES (aA) 
HD AS eo D4 faa 


pe- NU)... 
(2n)! 
This is Stirling’s formula. 


(atni) pee fte nh): vs (2.28) 


2b.10.4 Bessel’s formula (with 2n4+2 equidistant values of the 
argument) x 
Let f(x) be known for x=x, oth, XE 2h; s.. s Xo-tnh, xat 
(n-+1)4 and suppose f(x) is to be determined for xọ—h < x <xp-+h. ; 
Bessel’s formula may be obtained by averaging the Newton-Gauss 
forward formula starting with f (%,) and the Newton-Gauss backward 
formula starting with f(xg+-4). These formule based on the above- 
Stated (2n-+-2) equidistant values of the argument are 


S (8) =f 89) + Af re) + (5) S oA) + 
oP ges ') A®*f(x9—nh) + (an AtS anh) 


Fs (1)—5 


66 FUNDAMENTALS OF STATISTICS 
and f(s)=f(%o+h) + (u—1)Af (xo) + (3) (x0) +----- 


HTa afea —n UA) + (“HO |) tf enh). 


Averaging Koi we get 
f(x) = LOD LGM upari (REAN... 


re aiai ) A?*f(xy—n—1h) 7-4?" f(x — ak 
slat) * ( utn 

2n+1 2n+1 N2ntl | Nat gress fee —nh). 
This is Bessel’s formula. 

We thus see that Bessel’s formula also contains alternately mean 
coefficients and mean differences of the formule from which it is 
obtained by averaging. 

By substituting v=u— 4, we can put the above formula in a neat 
form as follows : 


1 
f(x) =f) t (xa th) +vAf(xo) + E ee le mt +4? f(x) 


MA P 
edia diri (*- (a1) 
Pea i [ic ad a 


(2n+1)! 


An important special case of Bessel’s formula is the formula for 
interpolating to halves, which occurs when v=0, i.e. when we are 


4 __14?*#1f(x4—nh). ... (2.29) 


interested in sath: In that case, all terms involving odd-order 


differences vanish and the formula reduces to 
fy = es Af (xo +2h) _ yc A? Slo —h) + A? f (xo) 
pit Beaty We) as ves (2.29a) 


NUMERICAL ANALYSIS 67 


We have already mentioned the situations in which Newton's 
forward and backward and Lagrange’s formul for interpolation are 
appropriate. In the beginning of Section 2b,10, we have also stated 
that the central difference formule are suitable for interpolating 
near the central part of a tabulated set of values. More specifically, 
it may be stated that Stirling’s formula will give better results when 
—0:25<u<0-25 whereas Bessel’s will be appropriate for 0-25<u<0'75 
(equivalently, for —0-25 < v < 0-25). Stated differently, it means 
that Stirling's formula will give more accurate results when inter- 
polating near the beginning or end of a central interval, whereas near 
the middle of a central interval Bessel’s formula will be approptiate, 
The smaller the values of u and v the more rapidly will the forsaule 
(2.28) and (2.29) converge. - 


2b.10.5 Laplace-Everett formula > 

Another central difference formula that is helpful in interpola- 
ting within a central interval while subdividing an interval is known 
as the Laplace-Everett formula. 

This formula may be derived by replacing every odd-order 
difference in the Newton-Gauss forward formula by the immediately 
lower even-order difference. Thus remembering that 


Af (xo) =f (to +h) —f (x0), Af (xo—h)=4*f (0) — °F (xo—A), ete, 
we have, starting from the Newton-Gauss forward formula, 


SO) Fie) Huapi (3) Afai ("5 ') 47 0A) 
+T) UED) a f(u 2A)+ = it 
Sf te) tul fH) So] + (5) °F Hof) 
+ ("5") are) =A? f(%)—h)) + sa flo 2h) 
a ka AS f(xo—h)— At f (Xo —2h)] + -e 


muf(eathy+ (5) 8 Fae) + (EPA oA 


68 FUNDAMENTALS OF STATISTICS 
+O) (0) +[(5)—("$")]4* rem) 
+$’) uA ("5?\ ja Sf (%q— 2h) IRE 

=uf(xo-+h) + fis ‘ars (xo) +("t7\a8 fie aa 


Heo) + PFT) San (FE) at 755-28) 40. 
(2.30) 
where {=1—u. 
This is the Laplace-Everett formula. 
Example 2.13 From the table given in Example 2.8, determine 
(a) 601820 and (b) ¢9-1662, 
Since the required values are near the central part of the table, 
we shall use the appropriate central difference formule, 
(a) Here h=0-01, x=0:1520 and we take x)=0-15, so that 
= 150—15_ 59 
— or 
Since x is near the beginning of a central interval (also —0:25< u 
<0-25), we use Stirling’s formula. Therefore, as an approximate 
value of ¢9-1520, we take 


So) put A oD) y aa finn 
since second differences are approximately constant, 


Thus 
gosto 1161894 +20 x LISSO OUT? 4 (20)? 0001 17 


=1-161834-+--0023237+--00000234 
=1-16416004, i.e. 1:164160 (correct to 6 decimal places). 
(b) In this case also, h=0-01, but x=0:1662 and we take 
xo=0:16, so that 


u= 1662—16 _ sey 
“Ul 


or v=12. 


NUMERIOAL ANALYSIS 69 


As x is near the middle of a central interval (also —0:25<v< 


0-25), we use Bessel’s formula, Therefore, as an approximate value 
of ¢9-1682, we have 


SF (x9) HEA + op (x9) HET D AY leh + AY (0) 


since second differences are approximately constant. Thus 


pamal TEATAR + 19.011704 


iji p= ioe 174-0001 ts 


= 1+179408 +--00141528—-0000138415 
= 11808094385, i.e. 1-180809 
(correct to 6 decimal places). 


2b.11 Remainder terms in interpolation formulz 

The different interpolation formule discussed so far are poly- 
nomials of various degrees, and these polynomials $(x) coincide with 
the given function f(#) at the given values of the argument ; i.e., 
$(x) =f (x) for xx, xy, žge s Kip (ey<akp ides <x»). But (x) 
is not necessarily the same as Jf (x) for other values of x. In this 
Section we shall study the remainder term, S (x) —(x), for the 
polynomial interpolation formule, 

(x) is a polynomial of order p; where p==n for Newton’s forward, 
Newton’s backward and Lagrange’s formula, and p=2n for Stirling’s 
and p=2n-+-1 for Bessel’s formula. So the (p+1)st derivative of 
$(x) with respect to x is zero, Let us define an arbitrary function F 
as follows : 


F(z) =f (2) —$(2)—L f(x) —g(x)] Sa a) 49) 


Where x9, #1) ss ;*, are the given values of the argument corres- 
ponding to which f(x) is known, (z—x,)(z—x,)...... (z—x,) is a 
polynomial of degree (+1) in z; and the (p+1)st derivative of it 
with respect to z is (p+-1)!. Let us assume that J (x) is continuous 
and has continuous derivatives of all orders in the interval from Xo 
to xp. Then the same is true for F(x) also, and further, F(z)=0 for 
SMR, Rgp seoses ,»*,. By repeated application of Rolle’s theorem, we 


70 FUNDAMENTALS OF STATISTICS 


find that there exists ¿, xy<{<x,, such that the (p+1)st derivative 
of Fat { is zero, ie. Fi? +1)(2) =0, 
From the above defining equation of F(z), we bais 


FUP¥1)(z) ae fF 12) = ff (x) — -A EN 


$ VERE CR E 
and so at z=%, 


(p+) 
JATA = (i) en) FD, (2.31) 
This is the remainder term in the interpolation formula (x). 
We state below the forms of the remainder term (2.31) for the 
different formule : 
The remainder term in Lagrange’s and Newton’s forward and 
backward formule is 
(n41) x 
(x(a) (ty) Fae. ve (2.32) 
The remainder term in Stirling’s formula is 
. 
(=a) (=i). (aden) ve (2.33) 
The remainder term in Bessel’s formula is 
(48) (8— 1) (282g ose (4a) (Aaa) 8H) aO 
0 1 siaa X= Xn) (x— 220) (X¥—¥ 41) Tn FA 


(2.34) 
Here x, =x9-+rh and x_,=x)—rh. 


When the analytical form of f(x) is unknown, /*)(£) may be 


replaced by an appropriate difference of f(x), using the relations 
between differences and derivatives. 


2b.12 Bivariate interpolation 

We have so far considered the problem of interpolation for a 
function of one argument. Sometimes the function may depend 
on two arguments, and we may have to interpolate for both the 
arguments. This may be done by first interpolating with respect to 
one argument and then interpolating with respect to the other. The 
work then comes down to applying the appropriate formula twice 
and, as such, gives rise to no new problem, 

A different solution of the problem can be obtained if we use a 
formula similar to Newton’s forward formula but modified to take 
account of two arguments instead of one. 


NUMERIOAL ANALYSIS n 


Suppose the function is denoted by z=f (x, y), where x andy are 
the two arguments. We shall consider the case of equidistant values 
of both the arguments with common differences h and k, respec- 
tively. We next define the two-way differences 


AY? f(x, =f (x th, y) -S y) =E) (9) 
and A" f(x, y) =f (xs 9k) —f (4s 9) =E — 1) S (5 9)- 
Similarly, we define 
EXE; f(x, 9) =f (*+hu, y+ko). 
Then we have the following equivalence relation : 
Am *=A7d5, 
where 4,=£,—1, 4,=E,—}. 


Now we give the general bivariate interpolation formula for the 
case of equidistant values of the arguments : 


S(% 9) =f (xot hu, yoke) 
=EtE f (xo 0) =(1 +4,)"(14+4,)°f (xo Io) 


=f (se) UAVS Hs 30) FUAS (xo I0)+ (5) 4° °F 2) 


+h) A92 f (xis Jo) Huv ALA flao po) ee (2.35) 


This formula corresponds to Newton’s forward formula : if we 
put either u=0 or v=0, we get Newton’s forward formula for the 
univariate case. The formula for linear interpolation for both the 
arguments is obtained from (2.35), by ignoring differences highe: 
than 4)'° or 4%", as 

S =P (Hos Fo) Hua’ S (žo Yo) HvA”? S (xo Yo) ve (2.36 

This linear formula can be easily extended to the case of function: 

of more than two arguments. 


72 FUNDAMENTALS OF STATISTIO3 
Thus 


S (xy, 2) =f (ži Fo Zo) +u ls S (Xo Jo o) +04, fo 6, Zo) 
Fwd, f (xo Jo zo)» ws» (2.37) 
where u=(x—x,)/h, v=(y—y,)/k and w=(z—z,)/l. 


2e NUMERICAL DIFFERENTIATION 


The basic idea involved here is to approximate the given func- 
tion y=f(x) over a short interval of x by a suitable polynomial 
interpolation formula ¢(x) and then to differentiate that formula 
rather than the original function. 

So we can obtain an estimate of the value of the derivative of a 
function f(x) at a given value of x even though the algebraic form 
of f(x) is not known. To do this wę require a table of values of 
F(x). This process of calculating the derivative of a function, with 
the help of the approximating interpolation formula and given a set 
of values of the function, is known as numerica: differentiation. The 
problem is solved by selecting an “appropriate interpolation formula 
and then differentiating it term by term as many times as is desired. 
If the given set of values of f(x) are at equidistant values of x, then 
we choose an interpolation formula using differences. If, moreover, 
the derivative required is at the beginning (end or central part) of 
the. tabulated values, then we select Newton’s forward (Newton’s 
backward or a central difference) formula. Otherwise, we use 
Lagrange’s formula or a divided difference formula, 

We now obtain a general relation connecting the operators A 


and Dat. We assume that the Taylor expansion of f(x) exists. 


Then f (x+h) can be expressed as 


SEERIAS G) +5, PR Fone 
or Bf(s)=[14-0D ED Brit ] F(x) =e? f(s), 
so that we have the equivalence relation: 
Ese'D 


or 4=[e*P—}}, i ws) (2.38) 


NUMERICAL ANALYSIS 73 


So, knowing 4f(x); 43f(x), ete., we'can find f'(x) using the relation 
D= i In(14+-d) mh ~A2449/3—.....J, i (239) 


where Iny stands for the logarithm of y with respect to the base ¢ 
(i.e. the natural logarithm of J), or an equivalent relation ; 


Le 1 “ 
D=—;!0(;- ,) =—j!l—48 1) 


= [E7 44E? [244 E-334 one J- ss. (2.89) 


Example 2.14 Let us obtain the value of d(e2)/dx at x=0-5 using 
. the table in Example 2.6. 

Since the value required is at the beginning of a set of equidistant 
values of the tabulated function, we take Newton’s forward formula 
and differentiate it. We take formula (2.12). 

In the present case, 

% =0,'2=0'5, h=5, 
so u=01 > 

de? _ de$ du deo i des, 
a a du BETS du ` 
Now we differentiate formula (2.12) retaining in it terms iab 
differences upto 44(e8). We are given five values of ef and so 
assume 4*e? is a constant. 

Thus 

de® 


3u?—6u42 
d(e2)|dx= PEE reti 


Aeg 


4u hove ate ats} 


+ 4 


so that 
(ef) fds] aao- 6 = AeL) [di] uno 


P43 poe 3976, c.g 
=|: 40—-4x —9:04-4-= x8.06— irr 3] 


=5[6:40+3616-+1:92094-1-1315] 


13:0684 
6137. 
== S =2 


We can repeat this process and obtain the values of the higher-order 


derivatives, 


74 FUNDAMENTALS OF STATISTIOS 


2d NUMERICAL INTEGRATION 


In this section, we shall consider some simple approximate 
methods of finding the value of a definite integral from a given set of 
numerical values of the integrand. This process is also known as 
mechanical quadrature when the integrand is a function of a single 
variable. x 

We replace the integrand by a suitable interpolation formula, 
usually one involving differences, and then integrate it term by term 
between the desired limits. We can get different quadrature formule, — 
as they are called, by replacing the integrand by different interpola- 
tion formule or retaining, in the same formula, terms up to different— 
orders of difference. We shall obtain below some quadrature — 
formule by integrating Newton’s forward formula. 

In Newton’s formula, u=(x—a,)/h and so dx=hdu; and if the 
limits of integration for x are a) and a,—ay+-nh, the limits in terms 
of u will be 0 and n. Hence 

agtah ä 


fi fix)dx=h f [ flao) +usfla)+ (3) 4a) + (3) 4) + a -|u 


=| nf (a9) + T Afla) E 7) a) 


tftn) A By, } (2a 


From this general formula, we obtain the following particular 
formula. 
24.1 Trapezoidal rule 

Here we assume that the integrand is such that it can be well — 
represented by a straight line in any interval of width A. That 
means f(x) can be replaced by a first-degree polynomial or, equival- 
lenty, 4f(x) can be regarded as a constant. Accordingly, putting in 


(2.40) n=l and neglecting differences of all orders higher than the 
first, we get 


f TEO fey +f(a))- 


NUMERICAL ANALYSIS 75 


Similarly, we have, for the other intervals, 


J fend S (a) +S a) 
bi 


etc. D 
Adding all these, we get finally 


Ono} 


J ‘Selden j fixt { "eas besos: +f “fla 


AEF (ao) + Flaa)) + gL (as) +S (0) 0 


HHS (Gna) $Y (en) = Sa) +2f (a) 


$2f (dg) + iet 2S (nas) tS (aa))}. (2-41) 
This is known.as the trapezoidal rule. It is useful where h is small, 
for any small segment of a smooth curve can be approximated 
by a straight line. Goemetrically, this rule means that we are 
replacing the graph of y=/(x) between ag and a,--nh by n segments 
of straight lines and then approximating the area under the curve 
by that of a polygon. ‘ 


2d.2 Simpson's one-third rule 
Here we assume that the integrand is such that it can be replaced 


by a second-degree polynomial over any interval of width 2h. 
Accordingly, we put in (2.40) n=2 and ignore differences of all 
orders above the second. We have then 


f 1 (x)dx=A[ 2f Cao) +24 f(a) + (-2) Alao] 
to 


=A f(a) + 4/ (as) +A (a))- 
Similarly, we have for the next interval 
Se 
f See Flo) +470) +/ la), 


ag 
and so on for the other intervals. 


76 FUNDAMENTALS OF STATISTICS 


Finally, we have (assuming n is even) 


f yita j iik f “riai ah + i "Fedde 
A Ak ni 


= 3S (00) H4 la) $4 (oad) FAC Sea) H4 (as) EADIE -+ 
+UA lan-1) Hlan) +4(@4)] 


=5L 0) HHS) +L ly) HtA an) a) 
Ff ds) HHS (ana) an) (2.42) 


This is known as Simpson’s rule or Simpson’s one-third rule. 


The rule is simple, accurate and the most useful of all the 
quadrature formula. In this case, we have assumed that the interval 
is divided into an even number of sub-intervals, and geometrically, 
the procedure means that we have replaced the graph of the given 
function by n/2 arcs of second-degree polynomials. 


2d.3° Weddle’s rule 
Here we replace the integrand by a sixth-degree polynomial over 
any interval of width 6h. Accordingly, we put n=6 and ignore 


differences of all orders above the sixth in (2.40). We have then, 
after some simplifications, ; 


as ¢ 

J Fleyde= a] 64 (ag) +1847 (ay) +2742 fay) +249 f(a) 

a 0 A 

+O Aa) Hga S lao) + tae ay]. 


The coefficient of 48 f (ap) differs from 3/10 by a small fraction, 1/140. 
Making this change in the coefficient of A* J (ao), which will be 
negligible if 4° f(a.) is small, we get 


*6 
J Fe)de= GUS ee) + f(a) +f lea) +6 (as) +S (0) 
ba FSS (a5) +f (a,)). 


NUMERICAL ANALYSIS 77 


For the next interval, we have 
"is j 
3 h 
S Adem +S aH atenlas) 
4 +5f (a1) +f (a12)], 
and so on for the other intervals. 
Finally, we have (assuming n is a multiple of 6) 


° 49 tg ne 
= TALS (en) + 5 aH aHa) HADESA) 


+2f (ag) tH- +2f(as-<) +5f (an-s) Elana) 
F+6f (an-s) +f (ans) +5f (@n-1)+/ (an)): s+ (2,43) 

This is Weddle’s rule. 

It is the most accurate of the above formule. In usefulness it is 
second only to Simpson’s rule. Geometrically, using Weddle’s rule 
means that one has replaced the graph of =f (x) in the interval a, 
tO an =a9+nh by n/6 arcs of sixth-degree polynomials. ; 

Similarly, by‘replacing f(x) by higher-degree polynomials, one 
can get other quadrature formula. Also, one can get central 
difference quadrature formulæ by replacing f(x) by some central ' 
difference interpolation formula and integrating it between the 
desired limits. 


2d.4 Relative accuracy of quadrature formulæ 

In this section, we shall obtain the error terms of the quadrature 
formule discussed in Sections: 2d.1—2d.3 and in Exercise 2.11. In 
order that we may compare the error terms and make a comparative 
assessment of the formula, we must apply each to the same interval 
and divide it into the same number of subintervals. The minimum 
number of subintervals needed for this purpose is, of course, six. 


Let us, therefore, evaluate 
at3h 


T= f f(x)dx 
-3h 
by all the four formulæ, after subdividing the interval into six 
subintervals of width h each. We also need the true value of J. 


78 FUNDAMENTALS OF STATISTIOS 


We assume that the integrand f(x) is continuous and possesses 
continuous derivatives of all orders in the interval (a—3h, a+3h). 
Let us also assume that the Taylor expansion of f(x) exists in this 
interval. Let the indefinite integral of f(x) be denoted by F(x) +c- 
Then the true value of the integral is 
at3h ‘ 
I= i J (x)dx=F(a+3h)—F(a—3h) 
a-3h 


(2.44) 
which is obtained by using Taylor’s expansion of F and using the 
fact that F’(a)=f(a), F" (a) =f (a), ete. 

aah 
Now, the approximate value of f f(#)dx as given by the trape- 
o-3h 


zoidal rule is, say, 
Ip=5L/(a—3h) Af (a+3h)+2{ f (a—2h) +f (a—h) 


+f (a) +f (a+h) +f (a+2h)}}- 
Next, we use Taylor’s expansion of the functions and get 


“y= Gh f(a) +1008 £1 (0) + 2S C0) + apg I Ka) bo 


74 
(2.45) 
Then the error in the trapezoidal rule is 
Ep=1—1Ir 
ay 89 1 
=—5 fai I" pay Fa ETENA (2.46) 


Now we obtain the approximate value of the integral by the 
one-third rule and proceed as before. We have 


LaS (a—3h)+f (a+ 3h) +4 (a—2A) +f (a) +f (a+ 24)} 
HUS la—h)+f la+) 
= Ghf (a) +9 f (a) 0H (a) Haggi SO 
(2.47) 


NUMERICAL ANALYSIS 79 


Thus the error in the one-third rule is 
En=I af, 1/3 


se 5 VORES f(a) esses 


5 
--Tal f'"(a) +30 f(a). ves (2,48) 
Similarly, for the three-eighths rule (vide Exercise 2.11), 


Iays= HLS (a— 38) +f (+38) +3 f (02h) +f (a+ 2A) 
$f (ah) + Fath} +2f(a)] 


=6hf(a)-+9h? f"(a)-+ Pi 7 "a)+h N ERE i 
(2.49) 
Hence the error in the three-eighths rule is given by 
Es,=1—Iyig 
=F f(a) — BaF fa) ane 
= PEL Pa) A O + (250) 


Treating the quantities in square brackets of (2.48) and (2.50) 
as nearly equal, we have the following approximate result : 


Ey)3[Esig=4/9- re (2.51) 

The approximate value of the integral by Weddle’s rule is 

Iw = h f(a—Bh) +5f(a—2A) +f (a—h) + 6f (a) +f (a+) 
+5f (a+2h) +f (a+3h)] 
; i, fe 

=6hf (a) +W f(a) + SUI" a) + gh La) bo 
(2.52) 
if the functions are replaced by their Taylor expansions. Thus the 

error in Weddle’s rule is 
Ew=I-lw 


=— Slain i (2.53) 


80 FUNDAMENTALS OF STATISTICS 


2d.5 Euler-Maclaurin formula 
Now we shall derive an important quadrature formula which is 
due to Euler and Maclaurin. This formula can also be used to find 


the sum of a series when the integral of the function involved can be 
easily calculated. 


Let it be required to find the sum Ss (a-+rh). We define a new 
function F(x) by the relation 4F(x) af (Hi Thus we have 
"Ef (a+ri)=F(a+nh)—F(a). 
Now, 
F(x) =4-1f (x) 

=[e*P —1]-1/(x), using relation (2.38) 

== [AD -+-h®.D*/2+h8D?/6+ h*D4/24-+-...,..]-1f(x) 

= (AD) + [1+ (hD/2+h?D?/6 + 43 D9/24+......)]- f(x) 

= (hD)~![1—hD/2+ h*D*/12—h4D4/720......] f (x) 

using the expansion of (1 +«)~! 


=P roir OS) gL) J: 


As Df(x)=/"(x), so Df (x)= [ f(x)dx: 
Thus 


"Sy (a-+r) =F(a-++nh)— F(a) 
a+nh 


=} f degl (at mh) Sl] + ALS (a+r) lo] 


ygol (a+ mh) =S" (a) J sess „a (2.54) 
aenh 


or pf Lodde=[5 fla) +f (ath) + hase 4ifla+n=th) +5 f (anh) | 
= RUS (etn) -7'(a)] 


+e LS etsh) e= ns (255) 


NUMERIOAL ANALYSIS 81 


This is known as the Euler-Maclaurin formula. In the form (2.54), 
it is useful in finding the sum of a series. Using the Bernoullian 
numbers Hissin B= ences 4 (2. si can be expressed as 


: if ad= =[3 OH atA) einir (a+nh)] 
ac (a+nh) —f'(a)] 


+2 I latnia la]. a h (235a) 


Example 2.15 PA the value of the definite integral 


fë 
x 
i 

correct to five places of decimals, using the trapezoidal, Simpson’s 
one-third, Simpson’s three-eighths and Weddle’s rules, and also 
obtain the errors of approximation, 

We divide the interval (1, 2) into six equal parts, each of width 
h=1/6. The values of the function y=1/x are next tabulated for 
each of the seven boundaries : i 


a 1 [x 
1 ` 1-000000 
7/6 0-857143 
8/6 0:750000 
9/6 0 656667 
10/6 0.600000 
11/6 0-545455 
2 0-500000 


(a) By are: rule, the intregral is evaluated as 
Tp='g[1:500000 +2 x 3-419265) 
=0:694877 
=0-69488, correct to five decimal places. 
(b) Simpson’s one-third rule gives 
1a =y [1500000 + 4 <2-069265 + 2 x 1-350000] 
=0-69317, correct to five decimal places. 


¥a(t)—6 


82 FUNDAMENTALS OF STATISTICS 


(c) Simpson’s three-eighths rule gives 
Ts;g=4'g{1:500000 + 3 x 2°752598-+-2 x 666667] 
=0:69320, correct to five decimal places. 
{d) By Weddle’s rule, we have for the integral the value 
Tw=y'y[2 85+5 x 1-402598 + 6 x -666667 ] 
=0-69315, correct to five decimal places. 
The true value of the integral is 


2 
ie [eatin 2=0-69315. 
x 
i 
Hence the absolute errors are 
Eg=|1—Ip| =0-00173, 
Ey)3= |I—L3| =0-00002, 
Eyjg=|1—Igy| =0-00005 
and Ew= |1—Iy| =0-00000. 
Example 2.16 Find the sum i#+2%4-...... +n, 
We use the Euler-Maclaurin formula in the form (2.54). In the 
present case we take f(x) =x3, a=1, h=1 and n=n in (2.54). Thus 
itn 
1 3 
EV ae 3 | x8dy— 3174 2] 
PERPER f dx—{(n-+-1)*—1] + p(n 1)"—1] 
i 
=[(n+ 1)*—1)/4—[(n+1)*—1)/2 


+L (n-+1)*—1]/4 
=[n(n+1)/2}. 


2e NUMERICAL SOLUTION OF EQUATIONS 


In this section, we shall consider some methods of finding the 
roots of an equation, to any desired degree of accuracy, when the 
coefficients of the equation aie pure numbers. Though most of the 
methods can be applied to simultaneous equations in more than one 
variable, we shall consider here only- the case of a single variable. 
Further, we shall be concerned with the determination of the real 
roots only. 


NUMERICAL ANALYSIS 83 


In all the methods we are going to discuss, we need, to start 
with, some approximation to the desired root. So we first consider 
methods of finding approximate values of roots. 

Let f(x)=0 be the equation whose roots have to be found. 
Then the graph of y=f(x) will cross the x-axis at the points whose 
abscisse are the roots. Approximate values of the real roots can, 
therefore, be obtained from the graph, and we need only that part 
of it where it crosses the x-axis. 

Another method is based on the fact that if f(x) is continuous in 
an interval containing the root and if in that interval we find that 
f(a) and f(b) are of opposite signs, then there will be at least one 
real root between a and $. For convergence of the approximate 
values of a root to the true value, it is necessary that a and } should 
be close to each other, 

Next we consider the different methods of determining the real 
roots of an equation. 


2e.1 Method of false position 

The oldest method of determining the real roots of a numerical 
equation is the method of ‘false position” or regula falsi. Suppose 
the desired root lies between x, and.x,, which are as close as possible, 
and f(x,), f(x,) have opposite signs. Assuming the part of the curve 
y=f(x) between x, and x, to be smooth, we can approximate this 
part of the curve by a straight line. In other words, we perform 
linear interpolation to find the root of f(x) =0 and get 


—f (41) =f) fixi) 


x—x Xy—Xy 


or Pa ERNA 


I (2) fim) 


(žem) [f(a 
or x=x pilem l e. (2.56 
Tew +1 seal tie he 

This x is, however, not the true value of the root.. Thisisonly a 
better approximation to the true root than either x, or x». We shall 
repeat this process a number of times till we get the root correct up 
to the desired number of decimal places. ‘ 


84 FUNDAMENTALS OF STATISTIOS 


2e.2 Newton-Raphson method 

The real roots of f(x)=0 can be computed rapidly by this 
method if the derivative of f(x) isa simple expression and is easily 
obtainable. 

Let x, be an approximate value of the root and A the correction 
to be applied, so that x,-+-/ is the correct value of the root. 

Then f(x)-+h)=0 and, expanding /(x,+4) by Taylor’s theorem, 
we get 


2 
SENHAS (x0) +f" (tort Oh) =O, 0<O<1. 
If xp is quite a good approximation to. the unknown root, h will 


be small and we can then neglect the term involving h*. Thus we 
have the relation 


(xo) +f’ (%o) =9, 


which gives the approximation to has 


n= Fee. ve (2.57) 
The improved value of the root is then 
xO =x thy, 
and the other approximations are : 
(x0 ) 
xD =x hg, where hy= fay 
TE 
xD) — (2) E ha, where hyn 


and so on. This process is repeated till we get the root correct up 
to the desired number of decimal places. 

Equation (2.57) is the Sandamental relation in this method. 
The larger the value of f'(x) near the root, the more rapid will be 
the convergence of xt), x2), 1... to the actual root. Should f'(x) 
be small near the root, convergence would be slow, and the method 
will fail if f’(x)=0 in the neighbourhood of the root. 


Example 2.17 Find the real root of 
2x—logx=7 
correct to five decimal places. 


aa maa aa 


NUMERICAL ANALYSIS 85 


Here’ f(x) =2x—logx—7, and it is found that S(3) and f(4) 
are of opposite signs. So there is a real root of the equation f(x) =0 
between 3 and 4, 


(a). Method of false position 


s w Ay 
st approx. —1-4 _q,1xI1-47 a 
40.89 SSSR Tagg Las: 
1 1-86 x37. 
=3.7 4 71 x:1682 
2nd approx. 3:7 —-1682 ne -1885 
38- -0203 =3-7 £08923, 
| -1885 LD8; ‘ 
<01 x -01749 
3rd approx. 3-78 —01749 sie aig 
i 3:79  -00136 =3:78+-009278, 


“01 -01885 x(3)=3-7892, 
*=3-78924 -0001 x 0001475. 


4th approx. 3-7892 —-0001475 -0001885 
3:7893 -0000410 =3:7892-+- -00007824, 


-0001 "0001885 x'#)—3-789278. 
So the root is 3.78928, correct to five decimal places. 
(b) Newton-Raphson method 
S(x) =2x—logx—7 
and so S' (x) =2—loge/x. 
It may be seen from a graph that an approximate value of the root 
is 3-7, Thus 


h=— 31173767 18826233 ~ 00934 


a — 3-7-4 +089 — 3-789 ; 


_7++5785246—7:578__ -0005246 
-~ 2—-1146196 — 18853804 


x'2)=3-7892 ; 


_74:5785475—7:5784_ -0001475 _. 
h= Se 88 55875 7 000078, 


x'3) = 3-789278, 
So the root, correct to five decimal places, is 3-78928, 


hs =-0002782, 


86 FUNDAMENTALS OF STATISTIOS 


2e.3. Method of iteration 


In the cases where the numerical equation f(x)=0 can be 
written as 


x=¢$(x), pga (2:58) 


the real root can be determined easily by a process known as 
iteration or successive approximation. Here we start with an 
approximate value x, of the root and, substituting it on the right- 
hand side of (2.58), get an improved value of the root, x), given by 
$ xP =e (xo); 

Again, we put x‘!) for x on the right-hand side of (2.58) and get 

the second approximation as 
Dph). 

This process is repeated until we have the root correct to the 
desired number of decimal places. 

This method is used only when |¢'(x)| <1 near the desired root, 


and the smaller the derivative the more rapid will be the convergence 
of the approximate roots to the correct value. 


2e.4 Convergence of the iteration method 
We now consider the condition under which the iteration method 
will converge, i.e. the condition under which the successive approxi- 
, mations xp, x'1), x'2),,..... will tend to the true value of the root. 
The true value of the root satisfies the equation 3 


x=ġ(x), 


and the first approximation is obtained as 


x =o (%0): 
Subtracting, we get 


x—x\V=4(x)—4(%9)- 
By the mean value theorem, 
(+) —$( 4) =(*—x0)$'(Eo)» 
where £, is a point in the interval (xy, x) or (x, x9). Thus we have 


xox Hea(x— x9)! (E). 


NUMERIOAL ANALYSIS 87 


Similar equations hold for the other approximations : 
ax (2—x'")8(E,)s 
x= (1—2) (En), 


xam (xxe (Ea 1). 
Multiplying together the n equations, member for member, and 
dividing by the common factor (x—x'®!)(x—x'®)...(x—x"-1), we get 


xx (24) TIVE: 


Now, if the maximum value m of |¢’(&;)| is less than 1 in the 
interval (x9, x) or (x, xo), so that |¢4'(é;)|<m<1 for each i, we have 


jx—x")| <m" |x—x6|- wise (71 (2s59) 
Thus the error after n repetitions of the process can be made as small 
as we please by increasing n suitably since the r-h.s. of (2.59) depends 
on m", which approaches 0 as n increases. Thus the condition for 
convergence of the iteration method is that |¢’(*«|<1 in the neigh- 
bourhood of the desired root, where ¢{x) is the function occurring 
in (2.58). The smaller the value of |¢’(x)| the more rapid is the 
convergence. 


2e.5 Convergence of the Newton-Raphson method 
The Newton-Raphson method can also be, considered as an 
iteration method. The nth approximation in this method is given by 
Mogens A) 


which may be written in the form 


x=$(x), 
with d(x) aff). 


Then by the result concerning the convergence of the iteration 
method, we know that the Newton-Raphson method will converge, if 


iboli" 


ORMO) s 


in the neighbourhood of the desired root. 


ice. if 


88 FUNDAMENTALS OF STATISTICS 


Example 2.18 Find by iteration the positive root of the equation 


e* =1+2x, 
correct four places of decimals. 
Here ~ 
e*=1+2x, 
or x=In(1+2x) 


=log (1 +2x)/loge. 
Taking f (x) =x—log (1+2x)/loge and forming a set of values of 
S(x) for different values of x, it is found that f(1-25) and Sf(1'26) are of 
opposite signs, So a positive root lies between 1-25 and 1°26. Thus 
we begin the process of iteration with x= 1-25 as the starting value. 


x (x) =log (14-2x)/loge 
1-250000 1-252760 
1-252760 1-254336 
1 254336 1-255235 
1-255235 1-255747 
1-255747 1-256038 
1-256038 1-256205 
1-256205 1-256299 
1-256299 1-256353 
1-256353 1-256384 
1-256384 1-256402 
1256402 1-256412 
1-256412 1-256418 
1-256418 1-256424 
1-256424 1256427 


So the root is 1:2564, correct to four places of decimals. 


2e.6 Horner’s method 

Horner discovered a convenient method for obtaining the coeffi- 
cients of an algebraic equation whose roots differ by a given 
constant from the roots of a given algebraic equation. By repeated 
application of this method, the real roots of an algebraic equation 
can be found up to a desired number of decimal places, 


NUMERICAL ANALYSIS 89 


First; we shall discuss the method as used in diminishing all 
the roots by a constant s.. For the purpose of illustration, we take 

Axt + Bx? 4-Cx?+Dx+E=0 RE O) 

to be the given equation, but the method is general and can be 


applied to any polynomial. If ffx) be the polynomial on the L.h.s. 
of (2.61), the equation whose roots are each diminished by s from 


the roots of (2.61) will be 
flr+s)=0 


or o IHL OES OES OHR". (2.62) 


Now, from (2.61) we find that 
S'(s) =4As? + 3Bs? +2Cs+D, 
Sf''(s) =12As? + 6Bs+4-2C, 
Sf’ (s)=24As+ 6B, 
Sf'"(s) =24A, 
Thus the coefficients of (2.62) are given by 
f(s\/24=A, f” (s)|6=4As+B, 
S''(s)/2=6As?+3Bs+C, (263) 
S'(s) =4A8 + 3Bs* + 20s+-D, 
S(s)=4Ast + Bs*4+ Cs? 4 Ds+E. 
We next describe Dorner scheme of systematically obtaining 
the coefficients of (2.62). We write down the coefficients A, B, C,...... 


of (2.61) in a row and form the following scheme. A letter below a 
a line stands for the sum of two quantities immediately above the 


line ; e.g, B4+-As=M. 


A B Cc D E 
PE E Me Os 
M N 0 |» 
As Ps Qs 
P Q | p 
AS Rs 
R x 
As 


90 FUNDAMENTALI OF STATISTICS 


It is easily verified that the quantities v, p, xX and ¢ obtained in 
the above scheme have the following values given in (2°63) : 
$=44s+B=f"'"(s)/6, 
X=GAs?-+3Bs+C=f""(s)/2, (2.64) 
u=4As3 43B} 20s 4D =f (s), f nt 
v=Ast4-Bs4C84Ds+E—f(s). J 
Thus the equation (2.62), whose roots are the roots of (2.61) each 
diminished by s, is 
Axt tex? +Xx?+px+v =0, 2.02.09) 


We next illustrate with an example how this scheme is applied 
repeatedly in order to obtain successively the different digits of the 
desired root. 


Example 2.19 Find the largest positive root of the equation 
x8 — 4-759x? 4+ 1759x 47-518 =0. 


The desired root lies between 3 and 4. So we first diminish the 
roots of the above equation by 3. 


l —4:759 1:759 7:518 
3 —5277 — —10:554 
—1.759 3518 [— 3036 
3 3-723 | 
TH | 0205 
[ tm 


Thus the equation, whose roots are the roots of the given equation 
diminished by 3, is 
x*+4-4-241x24.0-205x— 3:036 =0. 


The first decimal place of the desired root is now the same as the 
largest root of the above equation lying between 0 and 1, which is 
the same as the largest root lying between 0 and 10 of the following 
equation, obtained by multiplying all the roots by 10 : 

x°+42-41x?4.20:5x—3036 —0. 


It is found to be between 7 and 8 ; so we diminish the roots by 7. 


NUMERIOAL ANALYSIS 91 


1 42-41 20:5 —3036 
7 345-87 2564-59 
49-4] 366-37 = 470-41 
7 394-87 
sat Gia | 761-24. 
763-41 


The transformed equation is 
x34 63-41 x?-}.761-24x—471-41=0, 
and the desired root of the original equation up to the first two 
digits is 3-7, 
The third digit of the root is the largest root of 
x8-+63-41x?-+ 761-24r—471-41=0 
lying between 0 and L, or the largest root of 
x°4+634-1x?+4-76124x—471410=—0 
lying between 0 and 10. It is found to be between 5 and 6. An 
approximate idea of the root can be obtained by solving the linear 
part of the equation, viz. 
76124x—471410=0. 
We diminish the root of the equation by 5. 


1 634-1 76124 —471410 
5 3195-5 396565:25 
039-1 793195 — 7481575 
5 3220-5 Ee 
se [825400 
| 6491 


The transformed equation is 
x* 4649: 1x* 4 82540-0x—74815-75=0, 

and the desired root up to the first three digits is 3-75. 

An approximate value for the fourth digit of the root is obtained 
by solving 

82540. —74815-75=0. 

This suggests that the next digit is 9. In this way, we may repeat 
the process stated above and obtain as many digits of the desired 
root as needed. 


92 FUNDAMENTALS OF STATISJIOS 


Exercises 
2.1 Show that the rth-order divided difference is a symmetric 
function of the arguments. 


2.2 Obtain Newton’s forward and Newton’s backward formulz 
from Newton’s divided difference formula. 


2.3 Express the following divided differences in terms of finite 
differences : 


f(a, a—h, a+h) and f(a—h, a, a+-h, a+-2h). 


2.4 Define the first ascending (or backward) difference Y f (x) of 
J (x) by the relation 
VS) =f) f(x—h) 
` and the higher-order ascending differences by 
Vi! FR) HV VS) = V (V f (4). 


—w 
7 times 


Then show that 
VE f(x) =A! f(x—kh). 


(The operator Y is called the nabla operator, from the Arabic word 
for harp). 


2.5 Define the central difference operator 8 by 
=E p711 
or, equivalently, by 
ŝ=4E-! =EN, 
The averaging operator p is defined by 
p=} [E+E]. 
Show that the central difference interpolation formule can be 
elegantly expressed in terms of these operators 8 and p. 
2.6 Define the factorial polynomials ul*), for integral k, by 
ul —=u(u—1)(u—2)......(u—k—1) 
_ and the reciprocals of polynomials by 
ul-") l 
Then show that 
Aul*l = kul!-13, 
Ault] kulik, 


NUMERIOAL ANALYSIS 93 
27 Use the method of separation of symbols to Prove the 
following identities : 


(i) xf(a) +2*f(a+h) +8 f(a+2h) +00... 


= EF (5) ae (2) ra 
(ii) 4D F0)+ ("FS") Ate + (a0 f0)4 bec +4"f(a) 
=fla)+fla+h) +f(a+2h) 4 


Ahy +f(a+rh). 
Gi) flertni)+-xdglata=Th) + ("5") a27(045=2H 


2 = s 
+t )4*/(a-+n=BA) 4 
=f(a+x-+nh). 

2.8 Show that the linear interpolation formula can be expressed 
in the form 
$(x)= (xa—x) f(x) (44) f(e) 
i Rie | i 
and the corresponding remainder term R(x) =f(x)—4(x) has the 
following bound : ; 


IR KEM, 
If") |< Mand x < x Lg 


where 


2.9 Suppose f(x) is known for four equidistant values of x, 


viz. 
Xo Xoths xo+2h and x,4+3h. Then show that the maximum error 
for interpolating between Xo +h and x+2h is given by 


3h4 
126% 
where max | f(x) | =. 
Fg S* Se gth 
2.10 Obtain the following approximate results : 
(i) is) lfi) —A*f(x)/2+.A8 f(x)/3—......Jh; 
(ii) 


We) of feb) fiA): 


[Hints : For (i), differentiate Newton’s forward formula, and for 
(ii), differentiate Stirling’s formula.] 


94 FUNDAMENTALS OF STATISTICS 


2.11 Obtain the following quadrature formula : 

agt3h 5 

h 

fo da= g Aao) + 80 ao+ 8) + 8f(ao-+ 2h) +fiay+ 3A). 

*0 
[Hints: In (2.49), take n=3 and neglect differences of all orders 

above the third. 

This is known as Simpson's three-eighths rule and is less accurate 


than Simpson’s one-third rule.) 
1 


2.12 Ifu, is quadratic in x, find a quadrature formula for f u,dx 


0 
in terms of u_,, u, and uy. - 


2.13 Show that Weddle’s rule may be obtained by combining 
Simpson’s one-third and three-cighths rules in the ratio 9: —4. 
_ 2.14 (Gregory’s formula) Show that 


+5 flatnk)— gls ti Th) — Afla) 
-lát fta +n ih) + Atf (@)] pyolá Jia +i T34) —4*/(a)} 


3 EAE 
gool S la+n— 4h) +A%f(a))...... i 
[Hints : In formula (2.55), replace the derivatives of f at a by 
(2.39) and those ata +nh by (2.39a). 
This is a form of the Euler-Maclaurin formula using differences 
in place of derivatives, and this modification is due to Gregory.] 


2.15 (Stirling’s approximation to the factorial) Show that, for large 
n, the value of n! is given approximately by 


$ l 1 
n!=vV 2mnn" exp| -nt Tm eoa “his . 


(Hints: Evaluate logn! =Zlog i by using formula’ (2.54) and 


express it as the sum of a function of n and a constant term c. 
Determine ¢ by using Wallis’ formula 


log (m/2)=lim log (24*(n!)*/{(2n!)*(2n-+1)}}. 


NUMERIOAL ANALYSIS 95 
2.16 Show that the iteration formula for the square root of NV is 
aly 
¥ny=p(* +7): 
where x, and x,4, are, respectively, the nth and (n-+1)st approxi- 
mations of V/V. 


[Hint: Take f(x)=x2—WN and apply the Newton-Raphson 
method.] 


2.17 (Continuation) Obtain the iteration formula for the reci- 
procal of NV. 


2.18 Given the following table, determine log 261-8.and log 267°5: 


a log x 
261 2-4166405 
262. . 2-4183013 
263 2°4199557 
264 2-4216039 
. 265 24232459 
£66 2:4248816 
267 2-4265113 
268 2-4281348 


Ans. 2:4179697 ; 2-4273238. 

2.19 Various compressional forces were applied to a spring. The 

following table shows the loads (y, in kg.) that were needed to 
produce different degrees of contraction (x, in mm.) : 


= Ph" de 
ey 0 

5 48-7 

10 105-2 

15 172:4 

20 253-4 

25 351-2 

30 4687 
Using the above table, estimate the load needed to produce a 
contraction of 34 mm. Ans. 578-9 kg. 


2.20 The fourth powers of a number of integers are shown 
below : 


96 FUNDAMENTALS OF STATISTIOS 


83 
86 
89 
92 
95 


40,960,000 


47,438,321 
54,700,816 
62,742,241 
71,639,296 
81,450,625 


Calculate the values of (828)‘, (9-33)4, (8-67) and (884)*. 
2.21 With the help of the following table, determine the value 


of @ for which sin @=0-75 : 
ý () 
47° 
48° 
49° 
50° 
51° 


er. 
073135 
0-74314 
075471 
0-76604 
0-77715 


Ans. 48°59°. 


2.22 The growth of population in India, according to the decen- 


nial censuses, is shown below : 
Census year 
é 1901 
1911 
1921 
1931 
1941 
1951 
1961 


Population (in lakhs) 


2,3853 


4,391 


The census figure for 1941, which is not given here, is known to 
be highly unreliable. Give an estimate of the actual population 


for 1941. 
2.23 Given 
J (0)=9, 
SC)+S(2)=10, 
et SB)+S(A) +F (5) =65, 
find f (4). 
(Hint: Take 
S (#) =a bx- ext. 


Ans. 


Estimate a, b and ¢ from the three given equations. ] 


3,207 lakhs. 


Ans. 21 


NUMERICAL ANALYSIS 97 


2:24 The following table gives the values of the function 
F(:05)5 vi, v) for different values of v, and vp : 


vy 


10 20 ieee 40 


5 4:74 456 450 446 
10 2:98 277 2:70 2:66 
15 2:54 2-33 2:25 2-20 
20 2:35 212 2:04 1:99" 


Find the values of F(-05 ; 15, 8), F(-05; 24, 12) and F(-05; 
38, 19), taking 1/v, and 1/v, as the arguments. 

2.25 Find the smallest positive real root of xe*—2=0, to four 
significant figures, by the method of false position. Ans. 0:8526. 

2.26 Find by the Newton-Raphson method, to five signifieant 
figures, the root of sinx—*+1—0, (An approximate value of the 


root from graphs of y=sinx and JS Eri is known to be —0°4.) 
Ans. —0-42036.. 
2.27 Find the positive root of 
x* +5x—1000—0 
correct to four significant digits. 


SUGGESTED READING 


[1] Butler, R. and Kerr, E. An Introduction to Numerical Analysis (Chs. 
` 1—5). Isaac Pitman, 1962. 

[2] Freeman, H. Finite Differences for Actuarial Students (Chs. 1—5, 9). 
Cambridge University Press, 1962. 

(3] Henrici, P. Elements of Numerical Analysis. John Wiley, 1964, 

[4] Hildebrand, F. B. Introduction to Numerical Analysis. McGraw- 
Hill, 1956, and Tata McGraw-Hill, 1974. 

[5] Kunz, K.S. Numerical Analysis (Chs. 1—7, 11). McGraw-Hill, 1957. 

[6] Scarborough, J. B. Numerical Mathematical Analysis (Chs. 1—7, 9). 
Oxford University Press, 1958, and Oxford Book Co. (Indian 
Ed.), 1946. 

(7] Whittaker, E. and Robinson, G. Calculus of Observations (Chs. 
1—4, 6, 7). Blackie, 1946. 


¥s(1)—7 


3 ELEMENTS OF 
PROBABILITY THEORY 


3.1 Meaning of probability 

The word probability may be used in two different contexts. 
First, it may be used in regard to some proposition. Take, for 
instance, the statement : “It is very probable that India will adhere 
to the democratic system of government till the end of this century” 
or “It is very improbable that the country’s ‘brain drain’ will 
stop in the near future”. Probability here means the degree of 
belief in the proposition of the person making the statement. This 
may be called subjective probability. 

Alternatively, the word may be used in regard to the results of an 
experiment that can, conceivably, be repeated an infinite number of 
times under essentially similar conditions. The results will be called 
events. (Note that the word ‘event’, as well as the word ‘experiment’, 
is being used in a perfectly general sense. Thus the drawing of a 
ball from an urn having, say, 20 balls, the throwing of a coin and 

. the inspection of an article produced by a given machine will all be 
called experiments. And it is an event that a ball drawn from the 
urn is red, that in five tosses of the coin one gets two heads or that 
an article produced by the machine is defective.) The probability 
of an event here refers to the proportion of cases in which the event 
occurs in such repetitions of the experiment. This type of probabilty 
may be called objective, being a part of the real world, and it is 
with this sense of the word that we shall be concerned in the present 
discussion. 

3.2 Notation and terminology 

The events relating to an experiment will not be all of the same 
nature : some events may be more or less complex than the others. 
For instance, when a six-faced die is thrown, the appearance of an even 
number of points is a more complex event than, say, the appearance 
of five points. For the former can be looked upon as being composed 

98 


ELEMENTS OF PROBABILITY THEORY 99 


of a number of events of the latter type : an even number of points 
means 2 points or 4 points or 6 points. The latter type of event 
cannot be decomposed further. The results of an experiment that 
cannot be decomposed further are called elementary events, and the 
whole set of elementary events is called the sample space of the experi- 
ment. The sample space must be kept clearly in mind for a proper 
understanding of a probability problem. 

An elementary event is denoted by e or w, with or without 
suffixes, while a general event is denoted by one of the capitals 
A, B,C,etc. Any event (including the sample space, which is also 
called the sure event) may be defined by listing all its elementary 
events, through the use of the set-theoretic notation. 


Example 3.1 If a coin is thrown, the sample space of the experi- 
ment may be expressed as {H, T}, H and T denoting, respectively, 
the appearance of a head and the appearance of a tail. 

If two coins are thrown, the event that the first coin gives a head 
may be expressed as {HH, HT}, while the sample space has the 
representation {HH, HT, TH, TT}. 

In casethe experiment consists in measuring the heights (in, say, 
cm.) of schoolchildren, the sample space will be {w|0 < w< œ), i.e. 
will be the set of all positive real numbers. 

Certain operations with events, defined in (a)-(d) below, will be 
important for a treatment of probability theory. 

(a) Union: AUB will denote the occurrence of either A or B 
or both A and B, and will be called the unign of A and B ; similarly, 


the union OT PI TKE, sassis UA, will denote the occurrence of at 
isi 


least one of the events Aj, Ag, = ry. Po 


Fig. 3.1 Shaded area is AUB, 


100 FUNDAMENTALS OF STATISTICS 


(b) Difference: A—B will denote the occurrence of A together 
with the non-occurrence of B and will be called the difference of 
A from B. 


Fig. 3.2 Shaded area is A—2B. 


(c) Intersection: AC\B will denote the occurrence of both A 
and B, and will be called the intersection of A and B; similarly, 


the intersection BARA OA Oy will denote the simulta- 
=1 


neous occurrence of Aj, Ag, «00.15 Am 


Fig. 3.3 Shaded area is ANB. 


(d) Complementation: AC or A or A’ will denote the non- 
occurrence of A and will be called the complement of 4.* 


Fig. 3.4 Shaded area is AC, 


One should note that the union, difference and intersection of 
' 
events and the complement of an event are themselves events. 


*It follows that A—B= ANBC, 


ELEMENTS OF PROBABILITY THEORY 101 


The different operations may be represented by means of Venn 
diagrams, where the collection of geometrical points within-a given 
area is taken to represent the sample space, while the collection of 
points within an oval is taken to represent an event. This has 
been done in Figs. -3.1—3,4, where we have taken a rectangle for 
the sample space, a smaller oval within the rectangle for the event A 
and a bigger oval for the event B. (Note, however, that the events 
resulting from the operations cannot be always represented by 
ovals.) s 


It will be found that the operations of union and intersection 
have the following properties : 


Commutativity: AUB=BUA, AAB=BOA. a ARD 
Associativity: AU(BUC)=(AUB)UC, 
AX\(BAC)=(ANB)NC. ws (3:2) 
Distributivity : AA(BUC)=(ANB)U(AAC), 
AU(BAC)=(AUB)M(AUC).” ent ASD 
Idempotency:  AUA=A, ANA=A. ae EA 


Also, these two and complementation obey De Morgan's rules : 
(A°)°=A, (AUB)°=ASNBE 
and (AM\B) = ASUBE. 


3.3 Classical definition of probability 

Our treatment of probability theory will be restrictive in two 
respects. First, for the sake of simplicity, we shall assume that the 
total number of elementary events in the sample space is finite (say r). 
Thus the treatment will not be applicable to cases where the number 
of elementary events is infinite (either countable or uncountable). 

Secondly, it will be assumed that the experiment is such that the 
r elementary events are equally likely—in the sense that, when all 
relevant evidence is taken into account, no one of them can be 
expected to occur in preference to the others. This will mean that 
the probability P({e}) of any elementary event e is to be taken 
as ljr. 


102 FUNDAMENTALS OF STATISTICS 


' The probability P(A) of any event A (not necessarily elementary) 
is then 


1(A) 
P(A) =, ws (3.5) 


where r(A) is the number of elementary events favourable to A (so 
` that A happens when one of them happens and conversely). 
Equation (3.5) gives what is called the classical definition of 
probability, associated with the names of, among others, Laplace 
and James Bernoulli. 
Some elementary properties of the function P defined on the 
whole class of events follow immediately from (3.5) : 
(a) Since 0<r(A) <r, we have, on dividing by r, 
O0<P(A)<1 al Wer (3:6) 
for any event A, 
(b) IfA is an impossible event, r(A)=0, implying that 
" P(A) =0. 
(c) IFA isa sure (or certain) event, r(A)=r, so that 
P(A)=1. 


(d) Let the occurrence of A imply the occurrence of B. Then 
every event that is favourable to A is necessarily favourable to B. 
Hence r(A) <r(B), and so 


P(A) < P(B}. 


In Examples 3.2—3.4 we shall consider some direct applications of 
the definition in computing the probabilities of given events. 


Example 3.2 Suppose a six-faced die is thrown. What is the 
probability that the number appearing uppermost is even ? 

There are six possible cases as to the number appearing on the 
uppermost face of the die, viz. 1, 2, 3, 4, 5 and 6. These give the 
six elementary events defining the sample space of the experiment. 
If the die is perfectly regular in shape and is homogencous, and if the 
throw is made without giving any preference to any particular face, 
then these cases may also be considered equally likely. Of these six 
cases, three are favourable to the appearance of an even number, 


s 
ELEMENTS O¥ PROBABILITY THEORY 103 


viz. 2,4 and 6. Hence the required probability is, under the above 
assumptions, 
ae | 
6 2 
Example 3.3 The digits 1, 2, 3, 4, 5, 6 an:l 7 are written down in 
random order to give a 7-digit number. What is the probability 
that the number is divisible by 4 ? t 
The digits can be arranged in a total of 7! ways, which define- 
the sample space in the present case, That the digits are placed 
in random order means that the 7! ways are to be considered equally 
likely, A 7-digit number may be looked’ upon as the sum of two 
parts: (a) 100 times the number formed by the first 5 digits and 
(b) the number formed by the last 2 digits. The first part is always 
divisible by 4. “Hence for the 7-digit number to be divisible by 4, it 
is only necessary that the last two digits form a multiple of 4. This 
will be the case if the last two digits are 12, 16, 24, 32, 36, 52, 56, 
64, 72 or 76. In each case the first five places of the number can 
be filled in 5! ways, so that there is a total of 
10x 5! 
favourable elementary events. The required probability is, therefore, 
10x 5!_ 5 
ree 
Example 3.4 What is the probability of getting 25 points in 5 
throws of a die ? 
There are six possible cases as to the number of points obtained 
in each throw. An elementary event may, therefore, be represented 
by a vector l 


(iy, ig, ig iy iş)» 
where i=1, 2, 3, 4, 5 or 6 and represents the number of points 
obtained in the Ist throw, and similarly for the other components. 
The total number of elementary events in the sample space is, then, 
6x6x6x6x6=65. 
Provided the die is perfect and provided each throw is made with= 
out giving any conscious advantage to any particular face to turn up, 
these 6° clementary events may be supposed to be equally likely. 


104 FUNDAMENTALS OF STATISTIOS 


As to the number of elementary events favourable. to the 
occurrence of 25 points, we see that it is the same as the coefficient 
of x25 in the expansion of 

(alpat patatan 4 


I=] 


But this, again, is the same as the coefficient of x% in the expansion 
of 


(1—x)5(1—x)-9, se (8:7) 
Now, 
(1—28)§(I—x)-8 
= (1—54 10x12 — 1018-4...) (ipse iSe IXEXT s...) 


Hence the coefficient of x% in (3.7) is 
Cee 5x lut 10 x (12) _ 10x (5) 
2 


201° Bl 
24x 23 x22 x21 5 18x17 x16 15 
o = ae a n 
4 4! 
12x11x10x9 6x5 
Meee ge lx- 
=126. 
‘As such, the required probability is 
126_ 7 et 
6 2x6? 439° 


3.4 Theorems on the probability of a union cf events 

We shall now state and prove some theorems which will enable us 
to obtain the probability of the union (i.e. of the occurrence of at 
least one) of a set of events in terms of the probabilities of the compo- 
nent events and their intersections. 


Theorem 3.1 If Ay, Ag «+++» , A, are mutually exclusive, then 
PAVIA... -nAn = PCAs) + P(Ay) F-n P(A): 


ELEMENTS OF PROBABILITY THEORY 105 


Proof: As in Section 3.3, let us suppose that the total number of 
elementary events is r, of which r(A,) are favourable to A; The 
number of elementary events that are favourable to either 4, or A, 
is then r(4,)+1(4,). Hence 


P(A, A) =a tthe) 


laa) p rla) 
r r 
=P(A,)+P(A,). w. (3.8) 
Using result (3.8) repeatedly, we have, for any m, 
P(A UAU... VA) =P (LAU AV DA pO Am) 


Corollary 3.1.1 Suppose the events A,, Ay, ....-- > Am are exhaustive, 
i.e. are such that one of them must occur (meaning that A,U/A,V...... 
A, is a certain event), and also mutually exclusive. Then . 

P(A) + P(Ag) ++0--6-+P(An) = 1. 

In particular, the events A and AC, the complement of A, 
are exhaustive and mutually exclusive. Hence P(A) +P(A°)=1, 
implying that 

i P(A)=1—P(A°). 
This makes possible the computation of the probability of an event 
from that of its complement and vice versa. 

Corollary 3.1.2 Let Ay, An +++ , Åm be exhaustive forms of A 
l.e., let AVAU.. V A= A) and be mutually exclusive. Then 

P(A) =P(A,) + P(Ay) +++ +P(An)- 

Corollary 3.1.3 Suppose the occurrence of A implies the occurrence 
of B. Then B occurs if, and only if, one of the mutually exclusive 
events A and B—A occurs. Hence, from Corollary 3.1.2, 

P(B)=P(A)+P(B—A), 
so that 
P(B—A)=P(B)—P(A). 


106 FUNDAMENTALS OF STATISTICS 


Example 3.5 A lot of N objects contains Np objects of one kind 
(say, Np defectives) and Ng objects of another (say, Ng non- 
defectives.) Out of the lot n objects are chosen at random. What is 
the probability that there will be + defectives among the chosen 
objects ? ; 

Of the n objects selected, the first one may be any one of the N in 
the lot, the second any one of the remaining N—l, and so on. 
Hence the total number of elementary events, i.e. the total number 
of ways in which the n objects may be chosen, regard being had to 
the order in which they appear, is 

N(N=1)......(W=n$1)=(N)_: 
Since the selection is made at random, these are to be regarded as 
equally likely, 

Now consider the number of ways in which & defectives and n— k 
ngn-defectives may be chosen in a particular order, e.g. such that the 
first k are defective and the last n—k non-defective. This number is 

NON 1)... (Np—F+ 1) Nq(Nq—1)..+...(Nq—n-+k-+1) 
= (NP) 4 (Na) ane: A 
Hence the probability of having defective objects in the first k 
drawings and non-defective ones in the last n—k is 
(Nb) (NG) ae (N) we 

But this obviously is also the probability of having k defectives 
and n—k non-defectives in any other particular order. For our 
problem, the order is immaterial, and hence the required probability 
of having k defectives and n—t non-defectives is, by the theorem of 
total probability for mutually exclusive events (rather, by Coro llary 
3.1.2), 

CX (NPY (NG) na /(N) a 


where ¢ is the total number of orders (permutations) in which $ 
defectives and n—k non-defectives may appear. Since 


a 


the required probability is 


Os x oes [S=() (."4)/ (a): 


ELEMENTS OF PROBABILITY THEORY 107° 


Example 3.6 The conditions being the same as in Example KA i 
what is the probability that the sample will contain at least one 
defective object ? 

We shall first obtain the probability of the complementary event, 
viz. that there will be no defective object in the sample. By the 
same argument as in Example 3.5, this probability is 


(a G): 


‘The probability of getting at least one defective object is, 7 


Corollary 3.1.1, Sa 
1 (y = wN 


„The next theorem deals with the probability of the union of events 
that are not necessarily mutually exclusive. 


Theorem 3.2 Whatever be the events Aj, Ag, .....- aus 
P(A, VA WW... WAn = » seg? 2 PA 


# È PAA) +. TA aeni a AA 
ir 


Proof“ (by mathematical induction): Consider first, P(A,U4,). 
The event 4,/A, occurs if, and only if, one of the mutually exclusive 
events A, and A»—[A,A\Ag] occurs. Hence 

P(A, Ag) =P(A,)+-P(Ag—[AO49))- ; 
But the occurrence of AAA, implies the occurrence of A,, and so 
Corollary 3.1.3 gives 

P(Ag—[Ay\Ag]) =P(Ap)— P(A AA); , 


P(A Aq) = P(A) + P( Ay) —P( AOA). ots (SD) 
The theorem thus holds for m=2. 
Let the theorem be true form=1(>2). We shall show that in 
that case the theorem is necessarily true for m=i+ 1. 


Now, 
P(A IA. Agy 1) = PUTAS AWY...... DAQOA 3) 


=P(A,UAW.....\IA,)+-P(Ayy 1) 
P(A MAp] DAAA] LAAs ])s (8.10) 


so 


108 ; FUNDAMENTALS OF STATISTIOS 


from (3.9), taken together with (3.3). Since the theorem is supposed 
to hold for m=t, we have 


+(—1)""P(A,AA,N......4,) 
and 


P(A, Ary JU [Aggy Jee OLAOAL 1) 


t t 
= EPN da) REUNAN Ags) + hess 
<i 


(ZIP (AANA AN.. AANA). 


Substituting these expressions in (3.10), we have, after a rearrange- 
ment of terms, 


H(—IVP(AOA AN.. NA). 
Thus the theorem holds for m=t+1 if it holds for m=t. But it 
has already been proved to be true for m=2. Hence it must also be 


true for m=3, 4, ...... , Le. for all positive integral values of m> 2. 
Example 3.7 n objects marked 1, 2, ...... , n are distributed over n 
places marked 1, 2, ...... , n, one object being allotted to each place. 


What is the probability that none of the objects occupies the place 
corresponding to itself ? 

Let us first obtain the probability of the complementary event, 
viz. that at least one of the objects will occupy the place corres- 
‘ponding to itself. . 

Let A; denote the event that the object numbered i occupies 
the place numbered i, for i=1, 2, ...... s2 Here Aj; Ag, +... aes 
it should be noted, are not mutually exclusive events. 

Now, the n objects may be distributed over the n places in a total 
of n! different ways, which may be assumed to be equally likely. 

A; occurs when the ith place is occupied by the ith object, the 
remaining (n—1) places being occupied by the remaining objects in 
any arbitrary order, This can happen in (n—1)! ways, Hence 


paia! for each i. 


ELEMENTS OF PROBABILILY THEORY 109 


Again, both A; and A, occur when the ith and jth places are 
occupied by the corresponding objects, the other (n—2) places being 
filled by the remaining objects in any order whatever. So 

i(a—2)! pad ay Pe 
P(A;(Aj;) la for each i, j (i<j). 

Similarly for P(AyNAjMAx), P(A\AAJOVA, Ndi), ete. 

Lastly, 
P(A NAN.. Ni) => 


We have then, by virtue of the general theorem of total proba- 
bility for events that are not necessarily mutually exclusive, 


Fis a gly Yan) =(;) xi- (3) xat) E 
=H 


(since there are n terms in ZP). (5) terms in ZP(ANA;), and 
iio 
so on i 
4 DE Te ee TUA OL 5, 
2! 3! n! 
The required probability is 
I PAVANI OA 


ie. =l 


In case n is large, this may be approximately taken to be e71 
or 0-36788. 


3.5 Conditional probability 

Suppose the event A has non-zero probability, so that r(4)>0. 
Consider, together with r(A), the number of elementary events that 
are favourable to both A and B, i.e. the number r(AMB). The ratio 


is the proportion of elementary events that are favourable to B among 
elementary events that are favourable to A. Being analogous to the 


110 FUNDAMENTALS OF STATISTIOS 


ratio 19, which is the (unconditional) probability of B, itis called 


the conditional probability of B, given A (or under the condition that 
A has already occurred). In analogy with the symbol P(B) for 
unconditional probability, the conditional probability of B given A is 
denoted by P(B| A). 

The following theorem, called the theorem of compound probability, 
is concerned with the joint occurrence (i.e. intersection) of events, 


Theorem 3.3 Whatever be the events Aj, Ap <- » 4, such that 


P(Ay)>0, P(Aq|A,)>0, P(4a ADA) >0,; «...-+5 
(Amar |Ma AAA dang) > 0%, 
we have 
P(A OAN. -ee MNAn) = P(Ay)P(Ag| Ay) P(A] AAA) 


P(A ANAN. Aaaa: 
Proof: First, note that since P(4,)>0, i.e. since r(4,)>0, 


P(A)= NDA) 


can be written in the form 


r(Ay) x r(A NA) è 
r r(A) 
But J 
14) P(4,) 
and Oa = Pay] Ay): 
Hence 


P(AyO\4y)=P (Ay) P( dy] A,)- ee AN) 


Also, if P(A,)>0 and P(A,|.A,)>0, implying because of (3,11) 
that P(A,A,)>0, we have 


P(A AANA) =P(A,NA,)P(Ay| AAA) 
=P(A;)P(A,| Ay) P (4| 414). 


Proceeding in this way, we get the expression for the probability 
of the joint occurrence of Aj, Ay ...... re. ee 


* These conditions may be replaced by the equivalent single condition , 
P(A, NAD ..+4 Agar) >0. 


el ee it 


ELEMENTS OF rROBABILITY THEORY 111 


Corollary 3.3.1 (Theorem of total probability) Suppose the events 
By Sas dae » B, are exhaustive and mutually exclusive.* In that 
case, the events AAB, AABp ~... »4QB, are mutually exclusive 
and A=(AMB,)U(AAB,)V......U(AAB,). Hence 


P(4)=3P(A0B,) 


n 
=ZP(Bi)P(A|Bi), 
provided none of B,, By, «+... , B, has zero probability. 

Corollary 3.3.2 (Bayes theorem) Let, as before, B,, By, ...... IR; 
be exhaustive and mutually exclusive events, none of which has zero 
probability. Further, let A be an event which, too, has non-zero 
probability. Then the equations 

P(AQ\B;) =P(B))P(A| Bi) 
and P(AC\B,) =P(A)P(B,| A) 


lead to the result 
P(A)P(B;|A)=P(B,)P(A|B,) 


| A) = P(Bi)P(A[ Bi) 
— P(B;| A) To aaa 
or, because of Corollary 3.3.1, 
P(B;| A) = PBIPALB I) for 21, El. n 
2P (BP (4| B;) 
In many cases B,, Big, oia , B, may be looked upon as the 


Possible causes of the effect A. Bayes’ theorem then gives the probabi- 
lity of the cause B; when the effect A has occurred or the probability 
of the hypothesis B; in the light of the datum A. We may also say that 
it gives the posterior probability of B. in terms of the prior probabilities 
P(B peel 22 eat » n, and the conditional probabilities of A. 
Example 3.8 Three urns contain, respectively, 4 white, 2 black” 

balls; 2 white, 4 black balls; and 3 white, 3 black balls. One 
of the urns is chosen on the results of two throws of a coin: the first 
urn if head appears on each throw ; the second urn if tail appears on 


L- Actually, it is only necessary that the mutually ‘exclusive events Bi be such 
that 2 P(B;)=1 for this and the next result to hold. 
=1 


112 FUNDAMENTALS OF STATISTICS 


each throw, and the third urn in case head appears on one throw 
and tail on the other. Finally, a ball is drawn at random from the 
chosen urn. What is the probability for the ball being white ? 

We shall denote by B,, B, and B, the selection of the Ist, of the 
2nd and of the 3rd urn, respectively, which are also equivalent tu 

the appearance of two heads, of no head and of just one head in two 
` throws of the coin. Also, we shall denote by A the drawing of a 
white ball. The events B,, B, and B, are exhaustive and mutually 
exclusive. Further, none of them has zero probability. Hence, 
by the theorem of total probability, we have 
P(A) = P(B,)P(A| By) + P(By) P(Aj By) +P(Bs) P(A] Bs). 
Now, if the coin is assumed to be perfect, then 
P(B,) =}, P(B.) =}, P(Bs)=}.- 

If the Ist urn is chosen, then the ball is taken at random from a 

set of 6 balls of which 4 are white. Hence 
P(A|B,) =$- 

Similarly, 


P(A|B))=$ and P(A|B,) =}. 
Thus 
P(A)=4XgtiXatd Xa =4- 

Example 3.9 In the preceding problem, suppose that a ball taken 
at random from the chosen urn is found to be white. What is the 
probability that the Ist urn was chosen ? 

In the same notation as used previously, the required probability 
is P(B,|A). Bayes’ theorem gives 

P(B,14) =" nl? 


3.6 Statistical independence of ¢vents 
Suppose that A and B are such that P(B| A) is defined and 

P(B| A)=P(B). 
Then the probability of the occurrence of the event B is unaffected 
by the information that the event A has occurred, and we say that 
B is (statistically) independent of A. The following result then follows 
directly from (3.11) : 

P(AC\B)=P(A)P(B). vs (3:12) 


ELEMENTS OF PROBABILITY THEORY 113 


Thus in this case the probability of the joint occurrence of the 
two events is given simply by the product of their individual uncon- 
ditional probabilities. Relation (3.12), being symmetrical in A and 
B, suggests that if B is independent of 4, then A should also be 
considered independent of B—that, in fact, one should speak of the 
(mutual) independence of A and B. It is customary to take relation 
(3.12) as defining the independence of A and B. This definition is 
accepted irrespective of whether P(A), say, is or is not equal to zero, 
although in the former case P(B| A) is not defined. 

In the same way, m events Aj, Ag, s. » Am are said to be mutually 
independent if each event is independent of each, or the intersection 
of any number, of the other events, i.e. if the following 2"—m—1 
equations are satisfied : 


P(AiNA;) = P(4,)P(A,), for alli, (i<j) ; | 
PCAINA NAL) = P(A) P(A))P (Ag), for all i, jy RE<JER) SL og 13) 
; ; é ` : f 4 


P(A ODA. N An) =P(4)P(4,)-1:P(Ap): 
It is obvious that if the events Ap Apsesi » Am are mutually i 
independent, then thy theorem of compound probability takes the 
following form : 
Theorem 3.4 For the events Aj, A), ...... ies 
P(AYN AN. 0+.. Vg) =P(A:)P(Ag) -0-+--P( Ay), 
provided these events are independent. 


Example 3.10 A die is thrown 10 times. What is the probability 
of getting six points in each of 4 throws ? 

Under the usual assumptions, the probability of getting a six in a 
single throw is 4. Since the throws may be supposed to be made 
independently of each other, the events r Z. Ra » Aj, where A; 
stands either for the appearance of a six or for the non-appearance of 
a six in the ith throw, are to be taken as statistically independent. 
Hence, by the theorem of compound probability, the probability of 
getting a six in each of 4 particular throws, e.g. in each of the first 4 
throws, (and a number other than six in each of the other 6 throws) iz 


(3) = 


y 


\ 


X 


114 FUNDAMENTALS OF STATISTICS \ 


But this is also the probability of getting 4 sixes in any other order. 
Since the total number of ways in which 4 sixes may appear is 


10 
(4): 
the required probability is, by the theorem of probability for the 
union of mutually exclusive events, 


(a) a =0-05427, approximately. 


3.7. Limitations of the classical definition 

The classical definition of probability has some obvious drawbacks. 

For one thing, direct application of this definition is not possible 
if the total number of elementary events of an experiment is infinite. 
It is obvious, for instance, that the definition will have to be modi- 
fied and extended when we want to obtain the probability that a 
point selected in a given region will lie in a specified part of it. “lhis 
type of probability has been called geometrical probability (or probability 
in continuum). The method generally used in this case is indicated in 
the following examples. 


Example 3.11 Suppose a line segment AB is bisected at the point 
G. What is the probability that a point taken on the line segment 
lies on AC? 
To evaluate the probability let us divide the whole line segment 
into n smaller segments, each of length 
a= length of AB 
n 
if the point is chosen at random, we may assume that the probabi- 
lity that the point lies on one of these small segments is the same for 
all segments. In that case the required probability will be equal to 
the ratio of the number of small segments within AC to the number 
of those within AB, for a—>0 (or nœ). But, by taking smaller and 
smaller segments, we shall be making this ratio as near the ratio of 
the Isngth of AC to that of AB as we please. Hence the required 
probability will be 
length of AC_1 
length of AB 2° 


y 


ELEMENTS OF PROBABILITY THEORY 115 


The general expression for a probability of this kind is 


measure of the specified part of the region 


aed ig 
measure of the whole region i EME 


where by measure we mean the length, the area or the volume of the 
region, according as it is in one, in two or in three dimensions. 


Example 3.12 A line segment AB is divided by a fixed point C, 
such that length 4C=a and length CB=b. Two points ¥ and Y are 
chosen at random on AC and CB, respectively. What is the proba- 
bility that the segments AX, XY and YB can form a triangle ? 

Let us denote the distance AX by x and the distance YB by y, 
Clearly, 0<x<a and 0 < y < b. Also, the distance XY is a+b—x—y, 
Now, the segments AX, XY and YB can form a triangle if, and only 
if, the sum of the lengths of any two of them is greater than the 
length of the third segment, i e. if, and only if, 


(i) x+y >a+b—x—y or ys wee: 


(ii) x+(a+b—x—y) >y or yet 


and (iii) lad Ria pda or aft. 


go RRNA ES” SAE mh Ga oe EE 


Fig, 3.1 


116 FUNDAMENTALS OF STATISTIOS 


Also, X and Y are chosen independently. Hence (x, y) may be 
regarded as a point selected at random on a rectangle, say PQRS, 
of sides a and b. 

Case 1: If a> b, (x, y) can satisfy conditions (i), (ii) and (iii) if, 
and only if, it lies within AT UV (Fig. 3.1). Again, ATUV is isosceles, 
the angle formed by the two equal sides being a right angle. Hence 
area ATUV =} 6’, and the required probability is 

area ATUV 4b b 
area retangle PORS ab 2a’ 

Case2: If a<b, the required probability can be similarly 
shown to be 

@_a 
ab 26° 

Case3: If a=b, PQRS is a square and (x, y) can satisfy 
conditions (i), (ii) and (iii) if, and only if, it lies within AQRS, which 
again is isosceles and right-angled (Fig. 3.2). Also, 

area AQRS=} a?, 
area rectangle PQRS=—a?. 


Q _R 


. P a S 
Fig. 3.2 
Hence the required probability is now 
a_l 


ELEMENTS OF PROBABILITY THEORY 117 


Example 3.13 A board is covered with congruent rectangles, and 
a coin, the diameter of which is less than the smaller side of a rect- 
‘angle, is thrown on the board. What is the probability that it will 
be partly in one rectangle and partly in another ? 

Let a and b be the lengths of the two sides of a rectangle and 
d the diameter of the coin. First consider the complementary event, 
i.e. the event that the coin will be completely in one rectangle. 
To obtain its probability we need consider only the rectangle in 
which the centre of the coin lies. The coin will be wholly within 
this rectangle if the distance of the centre from each side is at least 
d/2 (i.e. at least equal to the radius of the coin}. The probability of 
the complementary event is, therefore, 

area of rectangle with sides (a—d) and (6—d) _ (a—d)(b—d) | 

area of rectangle with sides a and ġ ab 
The required probability is, then, 
1 —(4=4) (6—d) _d(a+b)—d? 
ab ab 

But the classical definition has a more serious drawback. 
Suppose we have a six-faced die known to be loaded in favour of 6. 
Here obviously the probability of 6 appearing uppermost in a throw 
is greater than 1/6. But how to determine this probability? The 
classical definition leaves this question unanswered. 

This definition of probability may also be criticised on the ground 
that it moves in a circle, being based on the idea of equally likely 
(i.e. equally probable) outcomes. 


3.8 An axiomatic approach 

The difficulties to which the classical definition leads may be 
obviated if we formulate a probability calculus on an aaa 
basis. For this, we start from the notion of probability as relative 
frequency in the long run. 

Suppose we perform a sequence of n repetitions of an experin.ent. 
Let f be the number of times A occurs among these n repetitions. 
fis called the absolute frequency (or, simply, frequency) of A, while the 


ratio f is called the relative frequency of A. 


118 FUNDAMENTALS OF STATISTIOS 


If several sequences, each of n repetitions of the experiment, are 
now considered, several ratios will be obtained : 


ft, fe fn. 
non 

In many cases, it is found that these relative frequencies differ 
from one another by small amounts, provided nis large. Thus in 
such cases, there is a tendency in the relative frequencies to accumu- 
late in the neighbourhood of some fixed value. This limiting value of 
fln as n— œ is regarded as the probability of A in the experiment. 

It is impossible here to practically determine the exact probability 
of an event. We cannot even prove the existence of a limit to the 
relative frequencies in any given case. On the other hand, we can 
regard any observed relative frequency as an estimate of the supposed 
limit. In this sense, one can even speak of the probability that a 
newborn child is a boy, that a man of sixty will not die within a 
year or that the height of a man is less than six feet. 

Probability being interpreted in this way, the relations : 

(I) for any event 4, 

P(A) 20; 

(II) if A is a sure event, 
P(A)=1; 
(III) for mutually exclusive events A, and Ap 
P(A\ 44) =P(A,) +P( Ay) 
(more generally, for mutually exclusive events Aj, Ap =- ; 
P(A41VA,V......)=P(4,) +P(A) + peagnt A 

and (IV) P(A,\A,)=P(A;)P(Aq|A;), provided P(A,) > 0 
—are taken as axioms, formulated, of course, in the light of similar 
properties of relative frequencies. The other relations established in 
the previous sections will then follow from these axioms. 

Thus, in whatever way we define probability, in each case it will 
be assumed to obey essentially the same laws. 

Any function P, or P(.), that obeys axioms (I)—(III) is called a 
probability function on the events in the sample space, and the function 
P(:|A,) defined by axiom (IV) is called the conditional probability 
function corresponding to P and associated with the event 4,. 


ELEMENTS OF PROBABILITY 1HEORY 119 


Such a function P, for purposes of probability theory, need not 
be considered in the context of any experiment. For instance, in 
the case of a finite sample space, with elementary events ¢,(«=1, 
Pairi ,r), we may associate with ¢, a non-negative number (ea) 


such that E plea) =L Then, for any event A, we may take 
a=1 
P(A) =F plea), the sum being taken over ail e, that belong ‘to A. 


Actually, p(e,)=P({ea}), the probability of the event having the 
single elementary event ¢, (or, loosely speaking, the p' obability 
of ea). 

This axiomatic treatment of probability theory is due mainly to 
the Russian mathematician Kolmogorov. - 

In statistical theory, the axiomatic definition is of greater help 
because of its generality than the classical definition, although the - 
latter is simpler to understand and to deal with. 


3.9 Random variable, and its expectation and variance 

In most cases, with eack elementary event in the sample space we 
may associate a real number. In tossing a coin, for instance, we may 
associate the number 1 with the appearance ofa head and the number 
0 with the appearance of a tail. In throwing a die, we have the 
numbers 1, 2, .....- , 6 corresponding to the six possibilities regarding 
the face that appears uppermost. We may thus define a function on 
the ample space. A (real-valued) function defined on the sample space 
is called a random variable or a stochastic variable. Obviously, to each 
value of a random variable x there corresponds a definite probability.* 
Let rith «++ , xp be the possible values of x, and let py, pps «+++. Pe 
be the corresponding probabilities. A statement of the possible 
values, together with the probabilities, gives the probability distribution 
of x. The probability p; is, of course, to be interpreted as 
approximately the proportion of cases in which x takes the value x; 
in a large series of repetitions of the experiment. 

One important characteristic of a random variable is its 
expectation. Thus, let x(eq) be the value of x corresponding to the 
elementary event e, («=1, s ae ,r), and let p(e,.) be the proba- 


*Actually, here we have a special type of random variable, which is the only 
appropriate type for a finite sample space. For some other types, see Section 9.2. 


120 FUNDAMENTALS OF STATISTICS 
bility associated with e,. Then 
E(x) =Ex(ex) plex) + (3:15) 
asl 


is the expectation of x. E(x) is often denoted by p, or, simply, p. 
Suppose now that the elementary events are numbered in such 
a way that x(e,) for =l, 2,...... >, are all equal to x,, x(e,) for 
CE ed PE eo Peer , fa are all equal to x,, and so on. Then (3.15) 
may be written alternatively as 
£ 1 fo f k 
E(x)=x E plea) trs B plea)+---- skai >o plea) 
asl a=r +1 amf kagt 
But the coefficient of x; here is nothing but the probability 
P[x=xiJ=pj- 


Hence we have also j; 
E=? Xx; py Tar (9:16) 
=1 


In this form, the expectation of 4 is seen to be the sura of the products 
of the different possible values of x by their probabilities. Since 
#; is the ‘long-run relative frequency’ with which x assumes the value 
xp E(x) may also be interpreted as the ‘long-run average value’ of x 
(vide Section 6.3). 

There is yet a third formula for E(x). Let Aj, Ag, «.---- , A, bea 
set of exhaustive and mutually exclusive events such that x takes the 
same value x; for all elementary events that are favourable to A, 
(for each j=1, 2,...... » !). Then (3.15) may be expressed in the form 


E(s)= È P(A). ps (8547) 


Indeed, this is the most general formula for E(x). In actual appli- 
. cations, we use one of these formula —generally the one that best 
serves Our purpose in the given context. 

We have the following theorems on expectation : 

Theorem 3.5 1f x=a, a constant, then E(x) =a. 

Proof: Since x=a, we have x(e,)=a for all a. Hence (3.15) 
gives 

E(x) =a F Plea) =a, 


since $ P(e,)=1. 


ELEMENTS OF PROBABILITY THEORY -121 
= 


Theorem 3.6 If y=bx, then E( y)=bE(x). 
Proof: Corresponding to the elementary event ¢,, x(e,) is the 
value of x and y(e,) is the value of y. Then 


Ely) = ye.) bles) 


= b È rtea) ples) since y(e,)=bx(ea) 


=bE(x). 
Theorem 3.7 1f x and y be two random variables and z a third 
random variable such that z=x+y, then E(z)=E(x)+2Z(y). 
Proof: Corresponding to the elementary event ¢,, x, y and z 
have the values x(e,), (€a) and z(e,), respectively. 
Further, z(e.) =x(¢a)+y(¢a). Hence 


B(z)= x zlea) Plea) 


= $ rlen) plea) + Eleal) 


=E(x)+E())- 
Theorem 3.8 If y=a+6x, then E(»)=a+bE(x). 
Proof: E(y)=E(a)+E(bx) from Theorem 3.7 
=a+bE(x) from Theorems 3.5 and 3.6. 


Another characteristic of a random variable is its variance, which 
serves as a measure of the variation or dispersion of the random 
variable about its expectation. The variance of a random variable x 
is defined by 

var (x) =E(x—E(x)]}*. (3.18) 
Often var (x) is denoted by oł or, simply, o*. The positive square-root 
of the variance is called the standard deviation (denoted by o, or g). 
Since 
[x — E(x) ]*=x*—2xE(x) + [E(x)]*, 
we have, by virtue of Theorems 3.5—3.8, 
var (x) = E(s#) —2E(x)E(x) + [E(2)}* 
= E(x*) =| E(x) }?. we (3-19) 
We have the following theorems on variance : 
Theorem 3.9 If x=a, a constant, then var (x) =0. 


122 FUNDAMENTALS OF STATISTIOS 


Pooof: From Theorem 3.5, E(x) =a. Hence [x—E(x)]*=0 and so, 

from Theorem 3.5 again, 
var (x) = E[x— E(x)]*=0. 

Theorem 3.10 If y= bx, then var( y) =b? var(x). 

Proof: From Theorem 3.6, E( y)=6E(x). 

Hence y—E(y)=b[x—E(x)] 
and [y —E(y)]=b°[x— E(x) 

On applying Theorem 3.6 again, we have 

E[ y—E(y)]’=b"E[x—E(x)]’, 
ie. var (y)=b? var(x). 

Theorem 3.11 If y=a+bx, then var ( y) =b? var(x). 

Proof: Theorem 3.8 gives E(y)=a+bE(x). Hence y—E(y) 
=b[x—E(x)]. Next, proceeding as in the proof of Theorem 3.10, we 
have the stated result. 

Example 3.14 Suppose two players, 4 and B, agree to play a game 
under the condition that A will get from B a rupees if he wins and 
will pay to B b rupees if he loses. Let the probability of A’s winning 
the gaine be / and that of B’s winning the game be g=1—p. 

As gain is then a random variable x, assuming two values, a 
(with probability p) and —6 (with probability q). Hence 

E(x) =ap —bq 
and var (x) =[a—E(x)]*p+ [—b—E(x)]}¢ 
== (a-+b)*q%p + (a+ b)*p*q 
=(a-+-b)'pq- 

Example 3.15 Let a die be thrown repeatedly 4ill the first six 
appears. The number of throws needed to get the first six is then a 
random variable x, taking the values 1, 2, 3, ...... ad inf. Further, if 
the die is perfect, the probability of getting a six in a throw is 1/6 
and that of getting one of the other values (viz. 1, 2, ....- , 5) is 5/6. 
Hence it i 

wy 
poaae) xh 
for the throws are made independently of each other and x=k if, 
and only if, a six is obtained in the kth throw but in none of the 
earlier throws. 


ELEMENTS OF PROBABILITY THEORY 423 
fhe mathematical expectation of x is, therefore, 


ronji 
peg] 
“4-9 


Thus ‘on the average’ six throws will be needed to get the 
first six. 
Again, 
x*®=x(x—1)+2, 
so that 
E(x?) =E[x(x—1)] + E(x) 


ETETE 


=2x4 x aH (6) + peated dibs | 


+6 
=2x! xp (IZ) +6=60+6=66. 
6*6 


Hence 
var(x) =E(x*) —[E(x)]?* 
=66—36=30. 


3.10 Joint distribution of two random variables 

In some investigations, we have to study more than one random 
variable at the same time. When a number of balls are taken from an 
urn containing red, white and black balls, for instance, the number of 
red balls obtained as well as the number of white balls may be a vari- 
able of interest. In studying the relationship of two variables, say, x 
and y, we have to consider the possible values of x, say, Xy Xg +- he; 
and the possible values of y, say, Yis uy +--+ „Jn as also the proba- 
bility for each pair of values (x; yj), where i=l; 2, ...... sk and 


124 FUNDAMENTALS OF STATIS'TIOS 


Seely Dy ir sl. We shall denote this probability by bi These py 
give what is called the joint probability distribution of x and y- This 
distribution may be represented by a two-way table like Table 3.1. 


TABLE 3.1 
Joint Disrrisution or Two RANDOM VARIABLES 


eee > 
ae pe rn eres 7 Marginal total 
2. 


Jı ĝu pn oe ee ee bki por 
Ji $n paz the ass: Tadi Pk Por 
5i Éi bat oe. oy Pot 
Marginal total | fio psp ve se wee re | 1 
Let 
t 
bo= Zbir s+» (3.20) 
Clearly, then 
bio=P[x=x;]. 
Also, let 
k 
boj= Zhi és (B27) 
so that 
hoj=Ply=y): 


These pj give the probability distribution (called the marginal 
probability distribution) of x. Likewise, the p, give the marginal 
distribution of y. 

In studying the interdependence of x and y, we have to examine 
how P|x=x;, y=y;] differs from the product P[x=x,)P[ y= ,], for 
each pair (i,j). In case 

Plx= xp Y=) =P[x=x]PLy=yj], for all bj, 
ie. Pij=Piohoy» for all i, j,- s (3.22) 


we say that x and y are stalistically independent. Otherwise, they are 
said to be statistically associated. 


ELEMENTS OF PROBABILITY THEORY 125 


An important feature of the joint distribution of x and y is their 


covariance, which is used in measuring the association between x and 3 
and is defined by 


cov(x, y) =E[x—E(x)][.y—E(y)). heen loneo) 
Since 
[*—E(x)] »—E( y)] =a9—xE( y) —pE(x) -+E(x)E( y), 
we have, by virtue of Theorems 3.5—3.8, 


cov(x, ») =E(xy)—E(x)E(y). o (3 24) 
The ratio 
pay Seey), (3.25) 


called the correlation coefficient of x and y, is used as a measure of the 
association between the two random variables. The various 
properties of p are discussed in Chapter 11. 

Example 3.16 An urn contains 2 red, 3 white and 3 black balls. 
If 3 balls are taken at random from the urn, what will be the 


covariance between the number of red balls and the number of 
white balls obtained ? 


TABLE 3,2 


Jom Distaisotion or rue Nomszr or Rep BALLS (x) 
AND THe Number or Warre BALLS (y) 


be ei 0 1 2 Marginal total 
2 
1/56 6/56. 3/56 10/56 
9/56 18/56. 3/56 30/56 
2 9/56 6/56 0 15/56 
3 56 0 0 1/56 
a ne 
Marginal total | 20/56 30/56. © 6/56 | 1 


Let the number of red balls and that of white balls be denoted 
by x and y, respectively. The possible values of x are 0, | and 2, 
while the possible values of y are 0, 1, 2 and 3. 


126 FUNDAMENTALS OF STATISTIOS 


Also, pj=Plx=xn y=) 


‘OO . ae 
(3) Beit » 1, 2, 3. 


The joint distribution and the marginal distribution of x and y 
are represented in Table DA 


= Hence r 4 3 
0 WE ce 
E(x) =1x 5 +2x 56-5674" 
ah aa 15 1 _63_% 
E(y) =! xz t? X50t 56 568° 
ues 6 3- 36_9 
and E(xy)=1 x pg t2x sgt? 56-56 74 
Thus 


ee Sy 45 
cove) 194% Bg 924" 

The following theorem on the probability distribution of two 
independent random variables is of frequent use in statistical theory : 
Theorem 3.12 fx and y are independent random variables, then 

E(xy) =E(*)E(9)- 
Proof + Using the notation already introduced, the expectation 
of the product xy may be written as 


k I 
E(xy) =2 2 x Ii dip» 


where we are taking formula (3.17) for expectation. (The values 
x;); may not be all different, but they arise from an exhaustive set 
of mutually exclusive cases.) Now, since x and y are supposed to be 
independent, pjj=PivPoj for alli, j. Hence 


‘ce 
E(x) =z. ZxiJsPioPos 


Since xj bio boj = (ibio) ( Hi Poi)» where the first factor depends 
on i alone and the second depends on j alone, we may write the 
above double sum as the product of two sums : 


k 
EW) = (Zr Pa) (2 Vo): 


è 
ELEMENTS OF PROBABILITY THEORY 127 


But the first factor on the right-hand side is, by definition (3.16), 
the expectation of x and the second factor is similarly the expectation 
ofy. As such, 
E(xy) =E(s)E(,). 
Corollary 3.12.1. Ifx and yare independent, we have, from (3.24) 
and (3.25), 


cov(x, y)=0 


* 
and Pay=0.- 


In Section 3.8 we had an expression for the expectation of the 
sum of a number of random variables in terms of the expectations of 
the components. Here we give a corresponding expression for the 
variance of the sum of a number of a random variables‘in terms of 
their variances and covariances. 


Theorem 3.13 Let xis Xas n.e... s Xm be m random variables( defined 
on the same sample space). Then 


Proof: Let us again start with two random variables, x and J- 
We have, from Theorem 3.7, 


E(x+-y) =E(x) +E(»). 
Hence 
var(x+-y) =E[(x-+y) —E(x+y)]* 
=El{x—E(x)}+{9—E( »)}]* 
= B[{x—E(x)}?+ 2{x—E(x)}{ »—E( »)}+{ »—E(y)}"] 
=var (x)-+2cov(x, y)+var( y), 
by virtue of Theorem 3.7. 


In the same way, for m random variables, x,, x5, ...... mun 


wales ed +0) = E[S E(x,)}} 
= ELE (1— E(x) + 2 Žiu EMEN 
<j 


= È var(x) a ge (xi yi): 


128 FUNDAMENTALS OF STATISTIOS 


Corollary 3.13.1 Tf xq, Xes -+ ++:3 Nm ATE (at least pairwise) independent, 
then cov(x;, xj)=-0 for i#j. Hence in this case we have, simply, 

var (x4 txot ee Xm) © 5 var(x;). 
v&i 

Example 3-17 There is a lot of N ohjects, from which objects are 
taken at random, one by one and with replacement. What are the 
expected value and variance of the least number of drawings needed 

*to get n different objects ? 

(Note that if the objects were taken without replacement, then 
just n drawings would be needed to get n different objects, Hence in 
that case the expected value would be n and the variance 0.) 

Denoting the number of drawings required to obtain n different 
objects by s, we may write 

; s=l 4rth “+ Xqnt> 
where x;=the least number of drawings required to obtain a new 
object when 7 different objects have already been obtained. From 
the nature of the problem, it is obvious that ti, Xp) ++. Teal are 
mutually (hence pairwise) independent, and for each i the possible value 


Proceeding as in Example 3.15, we then have 


ba Eaa E ae + ('-H) } 


~2 ELEY, N. 
Wye) oy 
Ni 
so that varz) =y 
Using Theorem 3.7 and Corollary 3.13.1, we get 
E(s)=1+E(x,)HE(ž) + E(x...) 


=M yt ya Pii +g] 


ELEMENTS OF PROBABILITY THEORY 129 - 


and var (s)=var (x;)+var (x4) +... Hvar (ža) 

2 1 2 n—l 

Mop aot teat 
(Use is made of the same technique in two other problems in . 
Sections 15.3 and 15.4.) 


3.11 Law of large numbers | 

Before this chapter is brought to a close, we should state ang, 
prove certain theorems which have come to have an important 
bearing on statistical theory. 

Theorem 3.14 (Chebyshev’s lemma): Let x be a random variable 
that takes non-negative values only-(more generally, let P[x <0]=0), 
and let p be its mathematical expectation. Then, whatever be the 
non-zero quantity ¢, we must have 


Plx <pt}>1—1 


ni 
Proof: In case p=0, the result is trivially true. For the only 
possible value of x is then 0, so that 
Pis < ptt] =P[x=0]=1> 1- A for all 140, 


We need, therefore, consider only the case when pO. Let the 
different possible values of x (or, more properly speaking, the 


* different values of x with positive probabilities) be x,, xp .....- RE 
and let fi fo e » fx be the corresponding probabilities. Suppose 
further that 

xi S pi? for i=l, 2; 3.3... „a evetn{8526) 
and x > pt? for i=a+1, a+2, ...... k: a (IRA 
Now, by definition, 
k 
H= D ži bo 
imi 
and since x; and p; are all non-negative quantities, 
k 
PŽ D ipi 
imati 


Again, by virtue of (3.27), 
k k 
2 
# 7,2, ja bi at indy pe 


7a (1)—9 P 


130 FUNDAMENTALS OF STATISTICS 


But ? 
p% lizita > pt]. 


in} 
It follows that 


pt? Px >p] <p 
or P[x> pt?) <3 (since p > 0). ' 


è Because 
Plx < pt?) =1—P[x > pt?), 
we then have A ; 
P(x Spt] > low 
Corollary 3.14.1 (Chebyshev’s inequality) : Since, for any random 
variable x with expectation p, (x—)* is itself a random variable 
assuming non-negative values only and having expectation 
E(x—p)?=0%, 
which is the variance of x, we get, from Chebyshev’s lemma. 
Piu) < o> 
or, equivalently, 
Pi{x—n| < to}>1—}, 
where t is any positive quantity. 


This gives a fair idea of the sense in which g may be looked upon 
as a measure of dispersion (vide Sections 8.4 and 10,2). 


Theorem 3.15 (Weak law of large numbers): Let xy xq) »+«-++ be a 
sequence of random variables having expectations pys jt, «s+. ° 
Further, let 


V,=Vat(x,+xg+...... +x,). 
1f e0 as n->-00, 
then, given any two positive quantities « and y, however small, we 
can find an ng, depending on e and y, such that 
Xyt gee tXna Htet 42s bbe +y 
P [| = =e <<>1-4 


for all n> ng. 


ELEMENTS OF PROBABILITY ‘THEORY 131 


F. Proof: Since 


gatat TEE tan) —PitHe+----.- +hn 
n n 
and varja tat EN +4, aU 
n n?’ 
we have, from Chebyshev’s inequality, F 
pf | acta an prti Ted Fajara 
n n n # 


for any positive t. 


Choosing ¢ in such a way that Wize we therefore have, for 
any e> 0, xt 
eran? a T +4n h r ET +y y, 
pl a a A 


Since Tro as n->oo, given ņe>0, however small, one can 


find an no, depending on 7e*, such that 


for all n>n. For such an ng we have, therefore, 


Nyt tat erse. Xn Uy tig toes +p 
ppitin ot case 


whenever n> nọ This proves the theorem. f 

Corollary 3.15.1 Let xy, Xp sre be independently distributed 
with identical marginal distributions. Their common expectation 
will be denoted by p and their common variance by où. Now, 
(xit xg... +-*,)/n=x, may be looked upon as a sample mean for 
a random sample (vide Section 15,5) of size n from a population with 
mean p and variance o?, 

Also, 

r a 
Va = var (x;) =no*, 


so that Zy, which equals Z —+0 as n-» co if o? be finite, Hence given 


132 FUNDAMENTALS OF STATISTICS 


«>0 and 7>0, however small, we can find an m, depending on e 
and h» such that 

P[|ž—u|<]>1—n 
for all n > n, provided o? is finite.* 

Corollary 3.15.2 Suppose we have a series of independent 
repetitions of an experiment for which the probability of an event A 
is p=P(A). Let x; be a random variable associated with the ith 
repetition such that 

1 if in the ith repetition A occurs 
{= { ‘ 
0 otherwise. 


Then, xis Egs sssr are independent and identically distributed 
random variables, with 


E(x) =! x fo+0(1—fo) =o 


and var(x;)=(1—Po)"Po+ (O—fo)*(1—Po) =o! —fo)- 
Also, 


(Xp tat eee +%_)/n=f,/n (say) 


is the relative frequency of the event A in the first n repetitions of 
the experiment. 

Since pq(1—po) is necessarily finite (0 < p.(1—p,)<1/4), it follows 
from the previous result that, given e > 0 and ņ >0, however small, 
we can find an ng, depending on e and y, such that 


P| |£- |<J>1- 


for any n> no. 
This particular form of the law of large numbers has been called 


_ Bernoulli's theorem after James Bernoulli, who, however, arrived at the 
result by a much more elaborate argument. 

Bernoulli’s theorem justifies and makes more precise the statement 
made in Section 3.8 that the probability of an event in an 
experiment is to be looked upon as the ‘long-run relative frequency’ 
of the event in repetitions of the experiment. 


* Ifthe sample space be finite, as we assumed in earlier sections, ot must be 
finite too. 


ELEMENTS OF PROBABILITY THEORY 133 


Questions and exercises 


3.1 In what sense do we use the word ‘probability’ in statistical 
theory ? Give the classical definition of probability and point out 
its limitations. j 

3.2 Give an outline of the axiomatic treatment of probability 
theory. 

3.3. State and prove the theorem on the probability of the union 
of a number of events. 

3.4 Supply an alternative proof for the theorem concerning the 
probability of the union ofsa number of events (Theorem 3.3) by 
showing that an elementary event e that is favourable to none of the 
events A,, Ay, 2.0... » Ám contributes 0 to the right-hand side of the 
equation, while an elementary event e that is favourable to exactly 
¿(1 < t< m) of the events contributes ple). 

3.5 Show that, for any events A;(i=1, 2, ...... sm) 

P(A) < È P(A) 
ist i=1 
while 
P(A) > jz Pls) (m— 1). 

3.6 Define conditional probability ; state and prove the theorem 
of compound probability. What is meant by saying that a number 
of events are independent ? 

3.7 What is a random variable, and what are its expectation 
and variance ? What is the covariance of two random variables ? 

3.8 State and prove the (weak) law of large numbers, Deduce, 
as a corollary, Bernoulli’s therem and comment on its implications, 

3.9 . A, and A, are two events related to an experiment Æ. 
Given P(A,)=1/2, P(A,)=1/3 and P(A,\A,) =1 /4, determine the 
following probabilities : 

(a) P(A{UAS), (b) P(ASAAS), (c) P(ASUA,), 
(d) P(A{MA,), (e) P([4,\VA,]*), (£f) P([4;A4;]*). 

3.10 Arrange the following quantities in increasing order `of 

magnitude with proper equality or inequality signs between them : 
P(Ay)» P(A) +P(4s), P(,UAy), P(4y\Ay). 


134 FUNDAMENTALS OF STATISIIOS 


3.11 Assume that neither A nor B is an impossible event. (a) If 
now A and B are mutually exclusive, will they be independent ? 
(b) If A and B are independent, will they be mutually exclusive ? 

3.12 Eight students are arranged at random (a) ina row and (b) 
in a ring. In each case, find the probability that two given students 
will be next to each other, Ans. (a) $3 (b) $- 

3.13 Three numbers are chosen at random from the first n 
natural numbers. What is the probability that the chosen numbers 
will be in arithmetic progression? (Consider separately the cases 
when n is even and when n is odd ) 

Ans. $/(n—1) if nis even ; 
$(n—1)/n(n—2) if nis odd. 

3.14 The nine digits 1, 2, ...... , 9 are arranged in random order 
to forma nine-digit number. Find the probability that 1,2 and 3 
appear as neighbours in the order mentioned. Ans. y 

3.15 Obtain the probability that the birth-days of seven people 
will fall on seven different days of the week, assuming equal proba- 
bilities for the seven days. Ans. 61/78. 


{ 


3.16 A box contains 40 envelopes, of which 25 are ordinary (not 
meant for air mail) and 16 are unstampcd, while the number of 
unstamped ordinary envelopes is 10. What is the probability that 
an envelope chosen from the box is a stamped air-mail envelope ? 


Ans. y 

3.17 Suppose A; Ag, «+--+ » A, are independent events and 

P(A;)=; Find the probability (a) that none of the events occurs ; 
(b) that at least one of the events occurs. 


3.18 Two players, A and B, throw n+1 and n coins, Niy. 
Show that the probability that A will have more heads than B 
is 1/2 and the Simari that they will have the same number of 


heads is pi laas 


3.19 Two players, A and B, throw a pair of dice. A, who starts 
the game, wins if he throws six before.B throws seven, and B wins 
if he throws seven before A throws six. Obtain the probability that 
A will win the game. Ans. $f. 


Iti" 


ELEMENTS OF PROBABILITY THEORY 135 


3.20 Show that in throwing a perfect die n times, the probability 
of having a total of s points is the same as the probability of having 
a total of 7n—s. 


3.21 From a full pack of playing cards, 3 cards are taken at 
random. Evaluate each of the following prol-abilities and verify 
that their sum is unity : 

(a) that the cards are of the same denomination;, 
(b) that 2 are of the same denomination and one different ; 
(c) that all are of different denominations. 

Ans. (a) sso; (b) sos (c) fiit 

3.22 What is the probability that in a game of poker, which 
requires the drawing of 5 cards from a full pack, all the cards drawn 
will be (a) of the same colour, (b) ofthe same suit ? 

Ans. (a) $s; reeeo 

3.23 Fifteen balls are distributed at random among 5 boxes. 
What is the probability that exactly 2 boxes will ıemain empty ? 

Ans. 2[3%—3 x21843)/518, 

3.24 Obtain the probability that in k throws ofa die each of the 
numbers I, 2, ...... , 6 will appear at least once. 

Ans. 1—6($)*+-15($)4 ~20(8)* +15(g)*~6(4)4. 

3.25 Five non-similar pairs of socks are in a closet. Four socks 
are selected at random. What is the probability that there will be 


at least one complete pair among the four socks chosen? Ans. 43. 


3.26 In the course of an experiment with a particular brand 
of D.D.T. on flies, it is found that 80%, are killed in ‘the initial 
application. Those which survive develop a resistance, so that the 
percentage of survivors killed in any later application is half that of 
the preceding application. Thus 40% of the survivors of the first 
application would succumb to the second, 20% of the survivors of 
the first two applications would succumb to the third, and so on, 
Find the probability 

(a) that a fly will survive four applications ; 
(b) that it will survive four applications, given that it has 
survived the first one. Ans. (a) 0-0864; (b) 0-432. 


136 . FUNDAMENTALS OF STATISTICS 


3.27 The probability that a family will have k children is ap“, 
for k=1, 2, .--+-- , the probability that it will have no child being 


1—aS pt. Assuming that the sex-ratio at birth (ie. the ratio of 
k 


=i 
number of male births to that of female births) is 1:1, obtain the 
probability that a family will have x sons. 
Partial ans. 2«p*|(2—p)**' for. x > hy 


328 It has been found from past experince that of the articles 
produced by a factory 20%, come from Machine 1, 30% from 
Machine 2 and 50% from Machine 3. The percentages of satis- 
factory articles among those produced are 95% for Machine 1, 85% 
for Machine 2 and 90% for Machine 3. (a) An article is chosen 
at random from a lot. What is the probability that it is satisfactory ? 
(b) Assuming that the article is satisfactory, what is the probability 
that it was produced by Machine 1? Ans. (a) 0895; (b) 0'212. 


3.29 Each of n urns has a white and b black balls. One ball is. 
chosen at random from the first urn and transferred to the second ;' 
then one ball is chosen at random from the second urn and trans- 
ferred to the third; and soon. At the end of this operation, if a 
ball is taken at random from the last urn, what is the probability 
of its being white ? Ans. al(a-+-6). 


3.30 An empty urn is filled with 5 balls, the colour of each ball 
being determined by throwing a perfect coin : in the case of a head, 
ared ball is put in the urn, while in the case ofa tail a white ball is 
put. (a) If a ball is now taken at random from the urn, and is 
found to be red, what is the probability that the urn has red balls 
only? (b) Ifa second ball is chosen from the urn without returning 
the first, what is the probability that this also will be red ? 

Ans. (a) ING; (b) 1/2. 

331 Three points, X, Y and Z, are taken at random on a line 
segment. What is the probability that Z lies between ¥ and Y. 

Ans. $ 

3.32 Ifa pointis point taken at random within a circle of radius 

r, what is the probability that its distance from the centre will 
exceed 1/2? Ans. 3. 


ELEMENTS OF PROBABILITY THEORY 137 


3.33 Two points are taken at random on the circumference of a 
circle of radius r. What is the probability that the distance between 
them will not exceed ar/4? What will be the probability of this 
event if, instead, the points are taken on a line segment of length 
2ar? _ Ans; 1-44 =44- 

3.34 A chord is drawn at random in a given circle. What is 
the probability that it is greater than the side of the equilateral 
triangle inscribed in the circle ? 

Ans. 4 if distance of chord from centre is chosen 
at random ; } if angle between chord and 
tangent at one end of it is chosen at random. 

3.35 Three points are taken at random on the circumference 
of a circle. What is the probability that they will all lie in a 
semi-circle ? Ans. $. 

3.36 A dealer in electrical goods receives 20 new lamp bulbs. 
He tests them one by one until he finds one of satisfactory quality. 
Suppose x denotes the number of bulbs tested by the dealer. Give 
the possible values of x and the corresponding probabilities, assuming 
that there are 4 defective bulbs in the batch. Hence obtain the 
expected value and the variance of x. 

3.37 10,000 tickets, each of Rs. 2, are to be sold in a lottery 
in which there are 2 prizes of Rs. 5,000 each, 10 of Rs. 200 each 
and 100 of Rs. 50 each. Determine the expected gain (or loss) ofa. 
man who buys a ticket for the lottery. Ans. Expected loss=30 P. 

3.38 In a lottery n tickets are drawn at a time out of N tickets 
numbered from 1 to N. Find the expectation and variance of the 
sum of the numbers on the tickets drawn. wet) i 

n(N?— n— 
Ans. n(N+1)/2; Aa l-i A 


3.39 An urn has been filled with 25 balls, each of which is either 
black or white, by a chance mechanism. The expectation of the 
number of black balls is known to be 9. Show that the probability 
of drawing a black ball from the urn is 9/25. 

3.40 Balls are taken one by one out of an urn containing 4 
white and b black balls until the first white ball appears. Determine 
the expected number of black balls preceding the first white ball. 

Ans. 6](a+1)- 


138 FUNDAMENTALS OF STATISTICS 


3.41 (a) Give a counter-example to show that the converse of 
Corollary 3.12.1 does not hold. 
(b) Prove that if each of x and y takes two possible values, 
then zero covariance implies independence. 


3.42 The random variables x,, xg, -..... are independent and x; 
assumes just two values, =z and eal with equal probabilities. Show 


that the law of large numbers holds for these variables. 


3.43 Let the random variables z}, xg, ---.-. be such that x; may 
depend on x;_, and x;,, but is independent of all the other variables. 
Show that the law of large number holds, provided each variable 
has a finite variance. 

3.44 Let the random variables x,, x9, ...... be such that the 
variances are bounded and the covariances are negativeShow that 
the law of large numbers applies. 


SUGGESTED READING 


[1] Cramér, H. The Elements of Probability Theory (Chs. 1—4). John 
Wiley, 1955. 

[2] Feller, W. An Introduction to Probability Theory and Its Applications, 
Vol. I (Chs. 1—5, 9). John Wiley, 1957, and Wiley Eastern, 
1960. 

(3] Goldberg, S. Probability, an Introduction. Prentice-Hall, 1960. 

[4] Hoel, P. G., Port, S. C. and Stone, C.J. Introduction to Proba- 
bility Theory, Vol. I (Chs. 1, 2). Houghton Mifflin, 1971. 

[5] Parzen, E. Modern Probability Theory and Its Applications (Chs. 
1—3, 5, 7,8). John Wiley, 1960, and Wiley Eastern, 1972. 

[6] Uspensky, J V. Introduction to Mathematical Probability (Chs. 1—3, 
9,10). McGraw-Hill, 1937, and Tata McGraw-Hill. 


PART TWO 


GENERAL STATISTICAL METHODS 


A, INTRODUCTION TO 
STATISTICAL METHODS 


4.1 What is statistics ? 

Life in the modern world is inextricably bound up with the 
notions of number, counting and measurement. One may try to 
think of a community that cannot count or take measurements 
and yet is concerned with such acts as selling and buying, carrying 
on banking transactions, operating locomotives, cars, ships and 
aircraft, keeping track of the fortunes of their favourite soccer or 
cricket teams, and taking part in government. The overriding 
importance of numerical data in modern Jife will then be all too 
apparent. In speaking about the Chinese, Anatole France once 
remarked: “If they don’t count, they won’t count.” The same 
remark would well apply to any other group of people. 

Statistics, as a plural noun, is used to mean numerical data 
arising in any sphere of human experience—to be precise, numerical 
data which arise from a host of uncontrolled, and mostly unknown, 
causes acting together. It is in this sense that the term is used when 
our daily newspapers give vital statistics, crime statistics or soccer 
statistics of Calcutta, or when the Food Minister in the Lok Sabha 
quotes statistics of sugar exports or those of food grain production. 

Used as singular, statistics is a name for the body of scientific 
methods (the statistical methods) which are meant for the collection, 
analysis and interpretation of numerical data. One has this sense 
in mind when one says that the present book is a text-book of 
statistics or that Mr So-and-So is an outstanding statistician. Not 
that Mr So-and-So has got numerical data on different topics at his 
finger-tips ; he is just well versed in statistical methods and their 


applications. 
4.2 Is statistics a science? 
It is to be noted that statistics’ is not a science in the sense 
physics or chemistry or biology or even economics is. Any science 
141 


142 FUNDAMENTALS OF STATISTICS 


has for its objective the formulation of laws for explaining pheno- 
mena in some part of the real world. Through observations or 
experiments, a science builds up hypotheses regarding the pheno- 
mena, whose validity is then examined through further observations 
or experiments. A hypothesis that stands repeated tests of this kind 
is raised to the level of a law. Indeed, every science is but a body of 
such laws. _ Statistics, however, is not a body. of laws, Among other 
functions, it formulates methods for the verification of hypotheses, 
for testing whether a hypothesis can claim to bea law. It would 
be more proper to describe statistics as a quantitative method of 
scientific investigations. 


Also, statistics has to do with only the group charactertics of 
numerical data on the basis of which an investigation will be 
conducted. The data may relate, for instance, to the scores of a ` 
number of students in a mathematics test. If our interest lies in the 
score of each student separately, then the data will not be amenable 
to statistical treatment. In statistics, the individual is important 
only in so far as it forms part of the group. 

Now, in studying the characteristics of a group, the variation 
inherent in it has to be taken into account. Statistics has, therefore, 
been called the study of variation. Indeed, a group without any 
variation, one whose members are all alike, is of no interest to 
statistics, For here the numerical data would arise from a fixed 
system of causes—not a multiplicity of uncontrolled, and mostly un- 
known, causes. In this sense, statistics may be said to have ushered 
in a new era in scientific research. 

Earlier scientists used to look,at things from a deterministic view- 
point. They would ask : “Does B occur as a result of A?” However, 
this meant an over-simplification of phenomena. For the real world 
is such that the conditions under which an experiment is conducted 
vary. Hence as a result of A, B may occur in some cases but may 
fail to occur in others. As such, it would be more pertinent to ask : 
“In what percentage of cases does B occur as a result of A? Or, in 
other words, what is the probability that B will occur as a result 
of A?” The formulation of scientific laws has, therefore, to be made 
in probabilistic, rather than in deterministic, terms. 

It has also to be remembered that a statistician is not interested 


INTRODUCTION TO STATISTICAL METHODS 143 


solely in the (group) characteristics of the individuals on hand. 
He would rather like to have information about the characteristics 
of the bigger group of which the given individuals are but a 
sample. A principal task of statistics is, therefore, the inference of 
the numerical features of the whole group from those of a part. 
This problem is analogous to the problem of classical inductive 
logic. The only difference is that here induction has to be achieved 
within a probabilistic framework. 


4.3 Statistics in matters of state ; 

The government of every country in our time collects, on a 
routine basis, numerical data in relation to its people, its economy, 
its natural resources and its socio-political condition. Such basic 
data as area, current population and growth rate of population, per 
capita national income, etċ., are now available for even the remotest 
parts of the world. 

A welfare state such as India, on the other hand, needs more 
extensive information relating to itself. To provide facilities in 
maternity hospitals or to arrange for the education of children, it must 
have data regarding the total number of births in previous years so 
that proper estimates may be had of the number of births and also 
of the number of children in different age-groups in future years. 
To provide hospital treatment for the sick, it must know the number 
of people that fall sick during a year and also the number that 
remain sick at any time of the year—both from all diseases taken 
together and also separately from every important disease. In order 
that all residents may be provided with their food needs, estimates are 
required of the total food production of the country for the current 
year and for some coming years, too, so that, if need be, food imports 
from other countries may be arranged. In the case of a likely 
surplus in food production, the country may plan to add to its 
foreign exchange reserves by exporting the surplus. 

Almost all countries in our time, those in the Third World in 
particular, have accepted planning as a matter of policy. India, 
for instance, has her Five-Year Plans that have formed a cornerstone 
of her efforts towards socio-economic development. Statistics 
regarding the diverse facets of national life are called-for in order 


144 FUNDAMENTALS OF STATISTIOS 


that the Plans may be properly formulated and implemented, too. 
Each of our Five-Year Plans has been based on detailed statistics, 
for the collection of which an elaborate machinery has been set up 
by the Government. Furthermore, in formulating the Plans on 
the basis of the data, some sophisticated statistical techniques are 
employed. 

We may also mention here politicians and social reformers who 
employ statistical facts as a basis for policy-making. To the layman, 
a knowledge of statistical methods may come in handy in viewing 
in their proper perspective the data that are presented to him day in 
and day out by the Government and the political parties. This will 
make him a more responsible citizen, not to be taken in by unscru- 
pulous propaganda. 


4.4 Statistics in business and commerce 

Business and commercial activities being very similar to the 
functions of a government, statistics and their interpretation also 
prove helpful for the proper organisation of business and 
commerce. 

A manufacturing concern needs to have a clear idea not only 
about its own resources but also about the likely demand for its 
products, Without this knowledge, it cannot properly draw up its 
production schedule, employ the right number and right kinds of 
employees and procure the right amounts of raw materials. 

A sales firm, too, must have a clear idea about the demand for 
goods that it deals in and about the rival firms in the field, Only 
then can it properly fix the selling price per unit of each commodity 
so as to cover its costs and have a fair margin of profit, and employ 

the right number of salesmen to keep pace not only with the overall 
"demand but also with the seasonal variation in demand. 

Modern manufacturers even bank upon statistical techniques to 
maintain the quality of products and to minimise inspection costs in 
supplying lots of items to consumers, at the same time keeping in 
control the chances of a high number of defectives being included in 
a lot. Techniques of operations research, that help in the smooth 
and efficient running of business and commerce as well as of govern- 
ment, are heavily dependent on statistical ideas. 


INTRODUOTION TO STATISTIOAL METHODS 145 


4.5 Statistics in agriculture 

We may now turn to another important sector of a country’s 
economy, viz. agriculture. For a country like India, it is by far-the 
most important sector. For sucha country, forecast of agricultural 
production—for each of the principal crops—has to be made well 
before harvest on the basis of estimates of the total area under the 
crop and its average yield, say average yield per acre. These 
estimates are obtained from area. sampling and crop-cutting experi- 
ments, which are designed strictly in accordance with statistical 
principles, The analysis of the data- also depend on statistical 
methods, 

In order to increase agricultural product on to keep pace with the 
increasing population, many countries are relyirg heavily on high- 
yielding varieties, especially for food crops like wheat, rice, maize, 
etc. In evolving new varieties, genetic studies requiring statistical 
tools have to be conducted. For comparing two or more varieties 
of a crop in respect of average-yield, statistical techniques of field 
experimentation come in handy. -A whole branch of statistics, viz. 
experimental design, has been developed mainly to take care of 
this need. 


4.6 Statistics in the sciences 

What distinguishes the réle of statistics in today’s world is its 
pre-eminence in scientific research, Indeed, most of the principles 
and techniques of statistics have been developed primarily to meet 
this need. Each science formulates laws to explain, phenomena that 
come up in some part of the real world. Thus the physical sciences 
deal with the physical properties of matter, botany deals with the 
world of plants, zoology deals with the world of animals, and so on. 
In going to formulate such laws, one has to conduct investigations 
and note their results. In modern times, the results are almost 
invariably recorded in numerical terms. This is true‘ not only of the 
physical sciences, but also of the biological, social and behavioural. 
The new laws of science, too, almost invariably concern quantitative 
aspects of things. The ideas and methods of statistics have, in conse- 
quence, been useful to the sciences—in the conduct of investigations, 
in the analysis of findings and also in the formulation of Jaws. 


¥a(1)—10 


146 FUNDAMENTALS OF STATISTIOS 


Consider first the physical sciences, viz. physics, chemistry and 
astronomy, These have been called the exact sciences, implying 
that they deal with exact relationships between phenomena, How- 
ever, the fact that even in experiments conducted under laboratory 
conditions extraneous factors cannot be totally eliminated points to 
the relevance of statistical ideas even to these sciences. Perhaps 
this point has long been, tacitly, conceded by men of these sciences 
in their adoption of the practice of using the average of several 
measurements while secking to determine the magnitude of a 
physical constant. What is more important, many, if not all, laws 
of these sciences should be-viewed as statistical laws. For instance, 
the law h=}gi*, relating the distance traversed and the time taken 
by a falling body, which is supposed to be exact, should really be 
regarded as a statistical law, representing the regression of h on t or 
the regression of ¢ on h, according as the values of t or those of h are 
kept under control in conducting the experiment. Certain newer 
laws of these sciences are more directly based on statistical ideas. 
The phenomenon of Brownian motion has, for example, been viewed 
as a stochastic process. (The subject of stochastic process has to do 
with the dependence of the probability distribution of the states of a 
phenomenon on time and/or space.) 

The biological sciences have long been relying on the concepts 
and techniques of statistics. Indeed, most of the methods of modern 
statistics have been developed primarily keeping in view this field 
of application. In ‘classifying plants or animals—the problem of 
taxonomy—biologists formerly used to focus their attention on 
qualitative characters alone, e.g. colour, shape, etc. But the problem 
of taxonomy is now tackled by considering quantitative characters 
as well, The classification of humans as brachycephalic, meso- 
cephalic and dolichocephalic —to take a simple example —is done by 
taking into account the values of the cephalic index, which is 100 
times the ratio of skull breadth to skull length. Consequently, 
statistical methods have found extensive applications in the field of 
biology. Even some of the laws of biology that concern qualitative 
characters, like Mendel’s laws, are really statistical laws, In recent 
times, statistical methods of experimentation have becn widely used 
in botany, zoology, physiology and biochemistry. 


INTRODUOTION TO STATISTICAL METHODS 147 


The social sciences also have been relying more and more on the 
methods of statistics in the collection and analysis of information, 
Statistical methods of sampling had their origin mainly in the need 
for data collection in such fields as economics, sociology, political 
science, social anthropology, ete. The very common methods of 
time-series analysis, demand analysis and index number construction 
that are employed by economists, are essentially statistical methods, 

Statistical methods are finding applications in the behavioural 
sciences, too. A psychologist may be required to assess the mental 
ability of students and suggest a proper mode of training for each. 
He will then rely on the statistical analysis of data regarding groups 
of children similar to them. In the field of education, there is 
the problem of comparing and combining the scores obtained by. 
students ina number of subjects. This problem can be dealt with 
only by statistical means. Such techniques as item analysis and 
factor analysis, that have been very popular with educationists, 
are also statistical techniques. 


4.7 A historical note 

The fourth book (“Book of Numbers”) of the Old Testament 
starts with the words: “And the Lord spake unto Moses in the 
wilderness of Sinai, in the tabernacle of the congregation, on the 
first day of the second month, in the second year after they were 
come out of the land of Egypt, saying, Take ye the sum of all the 
congregation to the children of Israel, after their families, by the 
house of their fathers, with the number of their names, every 
male by their polls ; From twenty years old and upward, all that 
are able to go forth to war in Isracl ; thou and Aaron shall number 
them by their armies.” In the subsequent passages one finds 
the results of that early census of the fighting men of Israel conducted 
by Moses around 1500 B.o. But censuses in ancient Babylonia, 
China and Egypt were taken as early as 3(0( n.c. for purposes 
of taxation. (The word ‘census’ itself is derived from the Latin 
word censere, which means ‘to tax’.) 

The Roman cénsus' was established by the sixth king of Rome, 
Servius Tullius (534-578 n.0.). Under this System, officials called 
censors drew up, ‘at five-year intervals, registers of the people and 


148 FUNDAMENTALS OF STATISTIOS 


their property for taxation purposes and to determine the number 
of able-bodied fighting men. The census was extended in 5 B o. by 
Caesar Augustus to include the whole Roman Empire. As the 
New Testament has in its first lines : “And it came to pass in those 
days, that there went out a decree from Caesar Augustus, that all 
the world should be taxed.” Joseph and Mary journeyed to 
Bethelhem for such registration and there was born the infant Jesus. 

With the collapse of the Roman Empire regular periodic censuses 
in the Western world were not conducted until the 17th century. 

During the Middle Ages the system of feudalism rendered national 
censuses impossible, although there were occasional attempts to 
revive them. The breviary of Charlemagne in 808 a.D. is a notable 
example, In 1085, William the Conqueror ordered a statistical 
survey of England. The Domesday Book contains the record of this 
survey. In the course of the survey information was collected on 
land, land-owners, land use, tenants, servants and livestock. These 
data served as the basis for taxation till 1522, when a new Domesday 
Book was prepared. 

Early in the 16th century, Bills of Mortality began to appear in 
London, presumably on the initiative of Henry VIII. Initially, 
the Bills of Mortality recorded only deaths from the plague. But 
over the years, their scope was expanded to include christenings and 
also deaths from other diseases, 

Very early censuses were conducted in the Americas by the 
Spanish conquerors. A census of Peru was carried out as early 
as 1548. But before the arrival of the Spaniards, the Incas had 
their own system of recording data. They made use of intertwined 
coloured strings and knots known as quipus. A register was main- 
tained of all the births and deaths occurring in the country and 
exact returns of the actual population were made annually to the 
government by means of the quipus. 

When the Constitution of the United States was framed, the 
census was made a regular and vital part of the government. The 
first census was taken in 1790, and other censuses have followed 
every ten years since. Besides, censuses in some specific fields have 
been taken at more frequent interyals. 


Etymologically, the term statistics can be traced to the Latin 


INTRODUCTION TO STATISTICAL METHODS 149 


words status (meaning state) and statista (meaning statesman). 
Aristotle’s Polity contains a comparative discription of 158 states. 
This initial attempt at a comparative description of states was 
later developed by Italian and German scholars into a subject 
called statistics (staatenkunde in German). 

The word statistics itself was coined by the German scholar 
Gottfried Achenwall around the middle of the 18th century. It was 
used for the first time in Great Britain by John Sinclair, who gave, 
in a series of volumes published between 1791 and 1799, a statistical 
account of Scotland prepared on the basis of communications from 
the ministers of the different parishes. Statistics appeared in the 
Encyclopaedia Britannica in 1797. It is found that in Great Briain 
the word gublicistics competed for a time with statistics in literary 
usage, but was soon abandoned in favour of the latter. 

Numerical. data, as we have seen, are being collected and 
interpreted by many societies over a few thousand years. But the 
methods used could be called statistical only after they began to be 
grounded in the ideas of probability in the 19th century. 

The earliest work on statistical methodology related to the 
normal distribution. It was derived in 1733 by De Moivre in an 
obscure paper (that was discovered by Karl Pearson in 1924). In 
1783, Laplace put forward the normal curve as being appropriate 
for the probability distribution of errors. In 1809, Gauss published 
his work on the theory of motions of heavenly bodies. He derived 
in it the normal curve as appropriate for a law of errors, at 
the same time acknowledging Laplace’s earlier contribution. But 
statistics, as we know it today, may be said to have its origin in 
the work of the Belgian mathematician Quetelet (1796-1874). He 
observed, for instance, that if the heights of a large group of people 
were shown in a bar diagram, the picture resembled the normal 
curve. He gave to the British Statistical Association in 18+] a list 
of more than 40 topics that could be studied by statistical methods, 

In the later half of the 19th century, Darwin’s The Origin of 
Species had a profound effect on many of the great minds, Statistics 
played a major réle in the study of the laws of heredity. In 
Austrian Silesia (now in Czechoslovakia), Gregor Mendel started 
his experiments with the edible pea, Pisum sativum. This work, 


150 FUNDAMENTALS OF STATISTIOS 


continued from 1856 to 1863, led Mendel to conclude that the laws 
of heredity are statistical in nature. In England, the work of 
Darwin inspired his cousin, Francis Galton, to undertake his 
own work on the theory of evolution. Better equipped mathemati- 
cally than Darwin, Galton published many articles and books on 
heredity. (His Hereditary Genius came out in 1869 and Natural 
Inheritance in 1889.) Galton was the first to use the statistical 
concepts of correlation and regression. 

Two of the most important centres of early statistical research 
were the University College, London, and the Rothamsted 
Experimental Station, headed by Karl Pearson and R. A. Fisher, 
respectively, In the 1890s, Karl Pearson began to apply mathematics 
and probability to Darwinian evolution. In 1901 the first issue of 
Biometrika, later to become a prestigious journal of statistics, appeared 
with Pearson as one of the editors. Through the work of Pearson 
and his associates, involving application of statistical methods to a 
wide variety of problems in evolution, heredity and related fields, 
probability came to be closely associated with statistics. A great 
deal of their work was involved with the statistical description of 
Populations through a system of frequency curves. It was also 
Pearson who illustrated the use of the chi-square statistic in testing 
the hypothesis that a given random sample has come from a 
specified distribution. 

Although some of their work was concerned with samples, the 
distinction between populations and samples was not always 
clearly maintained. The distinction came out better in the text 
book of G. U. Yule (first published in 1910) and better still in the 
work of R. A. Fisher. The formulation of a definition of random 
sampling may be attributed to John Venn and C. S. Peirce. The 
first table of random digits was published by Tippett in 1927. 

Another important contributor to Statistics was W, S. Gosset. 
After graduating from Oxford, he joined Guinness breweries of 
Dublin in 1899. Although he came to work asa brewer, soon he 
began to apply his creative genius to the analysis of data and 
planning of barley experiments, He also studied statistics at 
University College for a year. Gosset published a series of papers 
under the pseudonym ‘Student’. The most important of these was 


INTRODUCTION TO STATISTICAL METHODS 151 


the one entitled “The probable error of a mean”, which contained - 
seyeral remarkable results, including the distribution of a statistic 
that has come to be known as ‘Student’s p: 

R. A. Fisher had. his training in mathematics at Caius 
College, Cambridge. -After graduation, he volunteered for military 
service but was. rejected for poor eye-sight. He worked for some 
time on a farm in Canada and. then taught mathematics for four 
years at a number of public schools, He joined Rothamsted 
Experimental Station in 1919 and soon attracted the attention of his 
employers as a genius. The Rothomsted Station was established by 
John Bennet Lawes, an Oxford-trained chemist, on his family estate 
near Harpenden, England, The first agricultural experiments, were 
started there in 1843. In 1889, Lawes established a trust fund to 
ensure the continuation of the experiments, and when Fisher came 
to work here it had already become world famous. Because of 
Fisher’s pioneering work in experimental. design and other areas 
from 1919 to 1933 (he later moved to University College, Londof), 
the fame gradually encompassed statistics as well as agriculture. 

Theoretical developments in statistical inference began with 
the work of Karl Pearson and that of ‘Student’. But it received a 
great filip from the hands of Fisher, Jerzy Neyman and E.S, Pearson. 
The technique of analysis of variance was introduced by Fisher in 
1923, 

Point estimation is the process of calculating a specific value, 
called an estimate, from sample data to be used for the parameter 
value (e.g. the mean or the standard deviation) in a distribution, 
The goodness of an estimate is to be judged from the behaviour of 
the method of estimation in the totality of possible samples. Fisher 
gave criteria for a good estimate in a paper published in 1925. 

J. Neyman and E. S. Pearson presented in 1933 a satisfactory 
theory of hypothesis-testing. In their formulation, a test is a formal 
accept-reject rule. The theory requires specification of both a null 
and an alternative hypothesis. Good tests are those which can 
control the errors of rejecting the null hypothesis when it is true 
and accepting it when the alternative hypothesis is true. 

In the early 193Cs, Fisher and Neyman put forward independent 
formulations of interval estimates of parameters. Fisher’s fiducial 


152 FUNDAMENTALS OF STATISTICS 


“interval carries with it a probability statement about the parameter 

value for a specified sample. Neyman’s confidence interval, on the other 

` hand, carries with it a probability statement about intervals (arising 
from repeated sampling) for a specified parameter, 

Although Great Britain started as the nursery of statistics, the 
centre of attention gradually shifted to the United States. The 
beginning was made by G. W. Snedecor, who became interested in 
statistics as a young mathematics teacher at Ames, Iowa. In the 
early 1920s, Snedecor conducted agricultural experiments at the 
Towa State College Farm. His efforts led to the founding of the 
Statistical Laboratory in 1933. The statistics programme at Ames 
attracted outstanding statisticians like Kempthorne and Cochran. 
Gradually other US universities started their own statistics program- 
mes. Neyman came to join the University of California at Berkeley, 
Feller the University of Columbia and Wald the Princeton 
University. Wald’s contributions to sequential analysis and decision 
theory in the late 1940s ushered in a new era in statistical thinking. 
An area which has been enriched by statisticians in the USA and 
India is multivariate analysis. Although introduced by Fisher and 
Pearson, the subject developed in the hands of Hotelling, S. N. Roy, 

` Anderson and C. R. Rao. The current principal area of research is 
non-parametric inference, where the aim is to develop methods of 
inference that would be appropriate under very general conditions 
on the population distribution. 

There is also an active centre of research in the Soviet Union. 
The work of Kolmogorov, Smirnov, and others, has been mainly in the 
fields of probability, stochastic processes and mathematical statistics. 


4.8 Statistics in India 

Kautilya’s Arthashastra (321-296 B.0.) contains detailed instruc- 
tions for the conduct of agricultural, population and economic 
censuses in both rural and urban areas on a scale which is to be 
considered rare even by modern standards, From a more recent 
past, the Ain-i-Akbari (circa 1590 A.D.) tells us about khan sumaris 
that provided the Moghul emperors with rather detailed numerical 
records regarding the empire—its extent, resources, social condition, 
population, industry and wealth. - 


INTRODUOTION TO STATISTICAL METHODS 153 


Population censuses in India started to be taken on a systematic 
and regular basis in 1871-72 by the British rulers. Occasional 
surveys on a wider scale also used to be conducted, one of the most 
notable being the survey of Eastern India conducted by the East 
India Company in 1807-15. 

But the study of statistics as a scientific discipline and its applica» 
tion were introduced in this country by P. C. Mahalanobis in the 
early 1920s. He came to know of the work of British statisticians 
like Karl Pearson and R+ A. Fisher as a student at Cambridge. As 
Professor of Physics in Presidency College, Calcutta, he carried out 
a number of statistical studies with the help of part-time computists. 
Graduaily, a group of young and talented scientists gathered round 
him. They belonged to diverse fields, but Mahalanobis brought 
them together by kindling in them an interest in statistics. They 
worked in what came to be known as the Statistical Laboratory, 
located in the room of Mahalanobis in Presidency College. With 
the expansion of their activities, they felt the need-for a separate 
institute solely devoted to the study of statistics. The Indian 
Statistical Institute was founded in 1931, but for about twenty years 
it remained almost a part of the college, housed in a set of rooms 
of the Physics Department, It was only after this period that 
the ISI was shifted to its present sprawling campus at Baranagore. 
It was also on Mahalanobis’s initiative that Sankhya, the Indian 
Journal of Statistics and one of the finest of its kind in the world, 
was started in 1933. A separate department was started for post- 
graduate teaching in statistics, for the first time in India, by Calcutta 
University, in 1941, and Presidency College, Calcutta had the 
distinction of introducing undergraduate Honours teaching in the 
subject, for the first time in the country, in 1944. 

Till Mahalanobis turned his attention to the matter, official 
statistics used to be collected in India only as a by-product of 
administration. At Mahalanobis’s suggestion, the Government 
took steps towards improvement of the system of data-collection. 
A Central Statistical Unit was started by the Government in 1949, 
Two years later, the Central Statistical Organisation was established 
to co-ordinate the work of both the Central and the State Govern- 
ments. The National Sample Survey was created in 1£50 for the 


154 FUNDAMENTALS OF STATISTICS 


collection of socio-economic data through sample surveys on a 
continuing basis. Nor did Mahalanobis ignore the industrial 
sector: it was largely on his initiative that Indian industrialists 
adopted statistical techniques in such fields as surveying consumer 
demand and consumer preferences and controlling the quality of 
products, At the present time, every manufacturing or trading 
concern worth the name in India has a separate statistical wing. 

Meanwhile, the ISI has grown into a mini-university. Besides, 
quite a number of Indian universities now have full-fledged post- 
graduate departments of statistics. The researches carried out by 
Indian statisticians— Mahalanobis, R. C. Bose, S. N. Roy, C. R. Rao 
and many others—have won for them international recognition. 
Indeed, it would be no exaggeration to say that the work initiated 
by Mahalanobis and carried on by his associates and statisticians of 
later generations have taken India to the very centre of the world 
statistical map. 


Questions and exercises 
4.1 Explain, with suitable examples, the two senses in which 
the term statistics may be used. 


4.2 Isstatistics a science? Or is it’a scientific method ? Justify 
your answer. 


4.3 Discuss the rôle of statistics in matters of state, and adda 
brief note on the etymology of the word statistics. 


44 Write á note on the rôle of statistics in business and 
commerce. 


4.5 Discuss the relevance of.statistics to agriculture, 
4.6 Discuss the usefulness of statistics to the sciences, 


47 Statistics has been called by R, A. Fisher ‘the key technology’ 
of the twentieth century. Explain why it has been so called. 


4.8 Write in brief the history of statistics, 
4.9 Write a note on the history of statistics in India, 


J 
i 
À 
; 


INTRODUCTION TO STATISTICAL METHODS 155 
SUGGESTED READING 


[1] Croxton, F. E, and Cowden, D. J. Applied General Statistics 
(Chs. 1). Prentice-Hall, 1964. 

[2] Fisher, R. A. Statistical Methods for Research Workers (Ch. 1). 
Oliver & Boyd, 1954. 

[3] Folks, J. L. Ideas of Statistics (Lecture 1). John Wiley, 1981. 

[+] Mahalanobis, P. C. ‘Why statistics ?”, Sankhya, 10 (1950), 
pp. 195-228. 

[5] Rao, C. R. et al. (ed). “Scientific contributions of Professor 
P. C. Mahalanobis”, Contributions to Statistics (dedicatory volume 
presented to Professor Mahalanobis on his 70th birthday), 
pp. 495-516. Statistical Publishing Society, 1964. 

[6] Wallis, W. A. and Roberts, H. V. Statistics: a New Approach 
(Chs. 1, 2). Methuen, 1957. 

[7] Westergaard, H. Contributions to the History of Statistics. Agathon 
Press, 1968. 

[8] Yule, G. U. and Kendall, M. G. Introduction to the Theory of 
Statistics (Introduction). Charles Griffin, 1950. 


5 COLLECTION AND 
PRESENTATION OF DATA 


5.1 Primary data and secondary data F 

Statistics being a body of methods meant for the study of 
numerical data, itis obvious that the first step in any s.atistical 
enquiry must be the collection of the relevant numerical data. To 
study the growth of steel production in India since 1947, it is 
necessary to obtain the actual production figures for all years from 
1947 to date. If our aim is to study the efficacy of a given cure for 
bronchial asthma, we must collect data on people suffering from 
bronchial asthma and see how many got cured (or had a remarkable 
degree of relief) and how many did not after a course of treatment 
with the drug. 

Now, the data may be of two broad types: primary and secon- 
dary. The ordinary user of economic and social statistics will find 
that the data have been already collected by some other agency, 
government or private ; these may exist either in a published or in 
an unpublished form. His job will then be simply to have access to 
the source and get hold of the data. Such data will be called 
secondary data. Government departments collect data on diverse 
topics that touch the life of the people as a matter of routine and as 
an essential basis of administration. Private agencies like banks and 
industrial concerns regularly compile figures on their assets and 
liabilities, number of employees, income of employees, ete. The 
enquirer may get his material readymade from such agencies ; or 
he may get the data in a rough form and adapt them to his needs. 
In some cases, the enquirer will find that the relevant data have 
been collected by some research organisation as part of an 
investigation similar to his own, 

In making use of secondary data, the enquirer has to ke parti- 
cularly careful about the nature of the data—ticir coverage, the 
definitions on which they are based and their degree of reliability. 
Maybe, he will find that- the available data are more extensive than 


COLLECTION AND PRESBNTATION OF DATA 157 


is required for the purpose of his enquiry. In such a case, he will 
naturally discard the part of the data that is redundant. Sometimes 
he may as well find that the available information is inadequate for 
the purpose of his enquiry. He will then have to decide whether to 
collect his own data, either to base his enquiry solely on them or to 
plug the lacunz in the secondary data. 

Data collected primarily for the purpose of the given enquiry 
are called primary data. These are collected by the enquirer, either 
on his own or through some agency set up for the purpose, directly 
from the field of enquiry. It goes without saying that this type of 
data may be used with greater confidence, because the enquirer will 
himself decide upon the coverage of the data and the definitions to 
be used and, as such, will have a measure of control on the 
reliability of the data. - 


5.2 Collection of data 

The design of an enquiry and the setting up or modification of 
machinery for the collection of data are operations that deserve 
serious attention. Careful and detailed planning in the. initial 
Stages can lead to saving in time and money and improvement in 
accuracy. As complete a plan as possible should be drawn up 
before the actual collection of information begins, specifying what 
data are to be obtained, from whom and by what methods, There 
should also be full and unambiguous definitions of terms, clear 
instructions to investigators and respondents and, maybe, some 
indications of the mode of analysis of the results, Although the plan 
should be complete, it should not be totally rigid, for some adjust- 
ments to the plan will be ineyitable as the collection and analysis of 
data proceed. 

A fundamental question to be considered at the outset is 
whether the collection of data should be done by complete enumeration 
or by sampling. In the former case, each and every individual of 
the group to which the data are to relate is covered, and informa- 
tion gathered for each individual separately. In the latter, only 
some individals forming a representative part of the group are 
covered, cither because the group is too large or because the items 
on which information is sought are too numerous, Complete 


158 FUNDAMENTALS OF STATISTICS 


enumeration may lead to greater accuracy and greater refinement 
in analysis, but it may be a very expensive and time-consuming 
operation. A sample designed and taken with care can produce 
results that may be sufficiently accurate for the purpose of the 
enquiry, and it can save much time and money. We shall discuss 
in Volume Two the considerations that should guide us in choosing 
between complete enumeration and sampling and discuss some 
methods of getting representative samples, In some cases a combina- 
tion of the census and sample methods may by advisable. Thus 
frequent sample surveys may be used, as in demography, to fill the 
gaps between censuses taken at regular intervals. Or, some simple 
questions may be asked of every one, while more complicated 
questions may be put only to a proportion (say 5 per cent or 10 
per cent) of all respondents in an enquiry. 

Note, again, that the information sought may be gathered, from 
the individuals of the whole group (called the population) or from 
those of the sample, by one of three methods*—the questionnaire 
method, the interviewer method and the method of direct observation. 

In economic’and social enquiries, information is almost always 
collected by having someone to fill up a form or questionnaire. 
But a matter to be decided is whether the forms should be 
completed by an enumerator or investigator who collects data by 
asking questions and noting down answers, or whether these should 
be left with the respondent to be filled up on his own. In the 
questionnaire method, each informant (Or respondent) is provided with 
a questionnaire, usually sent by mail with return postage prepaid, 
and is asked to supply the information in the form of answers to the 
questions. Obviously, this method can be effective only when the 
informants have attained a certain level of education. It can work, 
for instance, when a daily newspaper decides to conduct an opinion 
poll among its readers on some topical issue. The drawback of the 
method is that the informants may not evince sufficient interest in 
the enquiry even if they are sufficiently enlightened. Consequently, 
the data may involve a high percentage of non-response and thus 
fail to reflect the true state of the field of enquiry. 


ie some cases, eap Sohel teg of the methods may be used. The 
ian census, e.g., ur e inteviewer met Or general i 
method for items that concern people with scientific or technical qualia 


COLLECTION AND PRESENTATION OF DATA 159 


In the interviewer method, enumerators go from one informant to 
another and elicit the required information. This method is used 
in population censuses. Also, it is the method that has to be 
employed in case the informants are not all literate or, even if 
literate, have not attained the requisite educational level. For 
instance, if one is interested in family income and expenditure on 
different items, one may arrange to interview the head of each 
family and collect the information sought from him, The data 
collected by this method are likely to be more accurate, since a tact- 
ful investigator may persuade the informant to supply the required 
information and the meaning of each question maybe properly 
explained to him so that the answers may be correct and to the point. 

Whichever of the two methods may be used, the questionnaire 
and the accompanying instructions to enumerators and respondents 
have to be very carefully designed, It is necessary that each 
question be clearly phrased and capable of unambiguous answer. 
The instructions must take into account all possibilities, even 
remote ones. The way a question is put may well influence the 
answer, as those who have conducted public opinion polls will bear 
out. A device often used to advantage is to insert a question 
meant primarily to produce answers to other questions. For 
instance, the relationship of the members of the household to one 
another may be used to check the stated age figures It is for this 
reason that forms often include apparently unnecessary or irrelevant 
questions, The report of a statistical enquiry should include the 
layout of the form used. 

In the method of direct observation, the enquirer or his 
assistants get the data directly from the field of enquiry without 
having to depend on the co-operation of informants. When data 
are needed on the height and weight of, say, 200 college students, 
they will be approached individually and the height (say in cm.) of 
cach measured with a tape and the weight (say in kg.) measured 
with a weighing balance. If data are needed on the sentence- 
length of a novel by, say, Bankimchandra, the enquirer himself will 
go through the book and note for each sentence the length, i.e. the 
number of words contained therein. On the other hand, if data 
are required on the incidence of blindness among a group of people, 


160 FUNDAMENTALS OF STATISTIOS 


one will just observe each member of the group and note whether he 
or she is or is not blind. The direct method of data collection may, 
therefore, involve either measurement or counting or bare observation. 


5.3 Scrutiny of data 

The data collected should be subjected to a through scrutiny to 
see if they may be considered correct. This is important, for 
however excellent the statistical methods of data analysis may be, 
they cannot bring out useful, reliable information from faulty, 
unreliable data. As we indicated earlier, the scrutiny has to be 
very thorough indeed in case the data are of the secondary type. 

Certain inaccuracies may be readily detected, e.g. inaccuracies 
that arise from the dropping or shifting of a decimal point and some 
of those that arise from the substitution of a 1 for a7 or vice versa, 
or of a 6 for a 9 or vice versa. Thus, consider the following set of 
figures which contains two erroneous entries : sheer common sense 
enables the reader to detect them. 


Stature (in cm.) of 10 college students 
140-9 161-2 153-9 172-2 162-9 
159-1 147:2 7735 181-5 15900 


In a second type of situation, there may be figures which, 
although not impossible, are very unlikely to be true and should 
rouse suspicion. If3 kilograms of rice is stated to be the daily 
consumption of a family of 4, the matter calls for investigation. 
Similarly, we should be hesitant in accepting 30 as the age of a 
son when th father’s age is stated to be 45. Ifitis found that the 
monthly income of a single person in a group of 500 is Rs. 2,000 
while that of every one else in the group is less than Rs. 690, one 
should take the first income figure to be suspect. One should make- 
enquiries to see if there is anything special about the work done by 
the person to make his income so high compared to those of the 
others in the group. 

Sometimes the year or the month to which a figure relates m>- 
be stated wrongly. Sometimes again, a figure may be given corres- 
ponding to February 30 or April 31 or, as in the case of a manu- 
facturing concern’s production statistics, corresponding to a day 


COLLECTION AND PRESENTATION OF DATA 161 


which was not a working day. Such mistakes may be. readily 
corrected ; only at times the enquirer may have to refer to the agency 
supplying the data to get the right year, month or day. 

In certain situations, one may have data consisting of two or 
more related series of figures. Here the data may be scrutinised by 
comparing the series for internal consistency, For the data in each 
series, taken separately, may look all right ; but the figures in the 
different series may be incompatible, thus pointing to the presence + 
of inaccuracies in the data. ‘To take a simple example, for a number 
of people the age figures as well as the birth dates may be 
available. These should then be compared for consistency. To 
take another, when the scores of students appearing in an examina- 
tion are available, the individual subject scores should be compared 
to the aggregate score. When we are given, for each of a number 
of families, the total income, the: total expenditure and the total 
savings over a-certain period, we should see if the relation 


savings=income— expenditure 


holds for cach family. When statistics of rice production are given, 
three series of figures may be available, viz, a series for total area 
sown (or harvested) in acres, one for total yield in quintals and one 
for yield-rate in quintal per acre, the different figures in a. series 
corresponding, maybe, to different villages or districts. The com- 
patibility or otherwise of the three series may then be judged, by 
verifying the relation 

total yield 


yield rate= 
toral area 


Again, if figures for the price of a commodity during two 
different periods are given, together with the percentage increase 
in price in the later period over the earlier, the data may be checked 
by seeing if the relation 

r 00 rice in period II— price in period I 
le increase: 00X (Prie ce in SA mi pora 
is satisfied. 

It. should, however, be obvious that no hard and fast rules may 
be laid down for the scrutiny of data. The enquirer must use his 


xs(1)—11 


162 . FUNDAMENTALS OF STATISTICS 


common sense, judgment and whatever knowledge he may have 
about the field of enquiry to assess the reliability of the data. 

It will be found, as the reader makes progress with the study of 
statistical methods, that certain statistical tools may be used to 
check the accuracy of figures—not the raw data of statistics, but 
figures derived from them according to some statistical concepts. 


5.4 Frequency data and non-frequeacy data 

Once the collected data are scrutinised and the errors therein are 
removed, one has to put them into a systematic form so as to bring 
into focus their salient features. Various modes of presentation may 
be suggested, depending on the way one would like to look at the 
data. 

First, consider the case where the values of one or more variables 
—e.g. population, foodgrain production, petroleum price, steel 
exports, etc.—are given for different points or periods of time. 
Generally in such a case, one will be interested in the relationship 
between time and the variable (or variables). For instance, one will 
like to know the way foodgrain production changes over time. Data 
of this kind are called.time-series data or historical data. Or it may be 
that values of one or more variables are given for different individ uals 
in a group (eg. individual countries or individual States or indi- 
vidual firms) for the same point or period of time. But instead of 
considering the characteristics of the group as such, we study the 
changes in the value (values) of the variable from country to 
country, f om State to State or from firm to firm. Data of this kind 
are called spatial-series data. What is important is that the identit y 
of the individul values has to be kept in view, and not ignored, in 
the statistical study of both kinds of data. Taken jointly, these two 
kinds constitute the class of non-frequency data. 

Second, consider the case where we still have data on one or 
more variables for different individuals—maybe even for different 
points or periods of time, for different regions of a country or for 
different countries—but the identity of the individuals is unimportant 
and can be ignored. For now we are interested in the character- 
istics of the group formed by the individuals rather than in those of 
the individuals themselves. Ia studying the intelligence quotients 


COLLECTION AND PRESENTATION OF DATA 163 


(IQs) of 15-year-olds in Delhi, for instance, we may be interested in 
such group characteristics as the percentage of 15-year-olds with 1Q 
higher than’ 130, the percentage of those with IQ between 110 and 
130, or the average 1Q of a 15-year-old in Delhi, or the lowest IQ 
and highest IQ for 15-year-olds in the city. Once the IQ data for 
all 15-year-olds in the city are obtained, one may in this case tétally 
forget which figure relates to. which particular individual in the 
group. This class of data is called frequency data, for here we are 
interested simply in knowing how frequently each of the different 
values of a variable occurs in the set of data, 

As the reader will find for himself, most of the diecuisions in the 
book will be devoted to the treatment of frequency data. But in the 
following sections of the present chapter, we shall deal with the 
common methods of presentation of non-frequency data. It will be 
seen in the sequel that the advanced modes of treatment of non- 
frequency data, especially of time-series data, are themselves 
modifications and extensions of techniques devised praire for 
frequency data. 


5.5 Textual and tabular presentation of data ; 

One of the common methods of presenting numerical data 
is to use paragraphs of text. Most official agencies use this 
method, and we illustrate it here by means of an excerpt.from the 
` «White Paper on General Budget” appearing in the publication 
Budget for 1957-58 of the Ministry of Finance, Government of India: 

Foie The value of imports increased from Rs. 418 crores in 
April-November, 1955, to Rs. 535 crores in April-November, 1956. 
Of this increase of¢Rs. 117 crores, Rs. 96 crores was accounted for 
by the increase in the imports of machinery, iron & steel and other 
metals. Imports of machinery increased from Rs. 73-5 crores in 
April-November, 1955, to Rs. 105:8 crores in, April-November, 
1956 ; of iron and steel from Rs. 34-3 crores to Rs. 88 crores and. of 
other metals from Rs. 16-4 crores to Rs. 26-2 crores.. 

“The small decline in exports (from Rs. 388- Fy crores in ‘Apsil- 
November, 1955, to Rs, 378 crores in April-November, 1956) is 
mostly accounted for by the decline in exports of oil (from Rs 26-2 
crores to Rs. 13-4 crores) and of cotton (from Rs, 24:6 crores to 


164 FUNDAMENTALS OF STATISTICS 


Rs. 10:9 crores), which is largely explained by the poor crop of oil 
seeds and cotton in 1955-56. Exports of manganese ore also declined 
from 106 crores in April-November, 1955, to Rs. 6'8 crores in 
April-November, 1956 ; of cotton textiles from Rs. 42:2 crores to 
Rs. 40:1 crores, and of jute manufactures from Rs. 83-3 crores to 
Rs. 79:5 crores...... eit 

The textual mode of data presentation has an appeal to people 
with a literary bent of mind, who have a distaste for the drabness 
of a table, and to those who are not satisfied with the broad trend 
of the cata that a graph or some other diagram may reveal. 
Besides, the writer of the text may draw the attention of the reader 
to points that he considers to be of special importance (which is not 
possible in tabular or diagrammatic presentation). He can, so to 
speak, add flesh and blood to the type of skeleton that a mere table 
or diagram for the data can be. 

An obvious disadvantage of the method is that it is not very 
useful when a large mass of data is to be presented. The method 
will not be useful either if a large number of comparisons are 
to be made or if attention of the reader is to be drawn to a large 
number of points. Moreover, in the case of a long textual statement, 
some repetition of explanatory material is inevitable. In consequence, 
the reader may find the report monotonous and boring. 

Anyway, the textual method of data presentation is almost 
invariably employed in official reports and in reports sent to 
management, either alone or in combination with the tabular or 
the diagrammatic method. While no hard and fast rules can be 
laid down for this mode of data presentation, care should be taken 
to make the réport objective and unambiguous. It should be brief 
and precise and should follow a logical sequence. 

The presentation of data by means of tables is generally preferred 
by statisticians, because a table can show the data in a compact 
form and a complete table with its title, heading and sub-headings 
can bring all the essential features of the data into a clearer perspec- 
tive. Again, this type of presentation does not call for any repeti- 
tion of explanatory material and, provided the table is properly 
constructed, it enables comparisons to be easily made and attention 
to be readily drawn to important features of the data. Another 


COLLECTION AND PRESENTATION OF DATA 165 


advantages of this mode of data presentation is that here errors and 
omissions, if any, may be readily detected—which is not the case 
with the textual method. 

The different parts of a table are : 

(i) Title. A title giving a brief description of the contents 
should always form part of the table. Usually it is placed at the 
head of the table, together with an identifying number that may 
be used for future reference. 

(ii) Stub. The extreme left part of the table, which is meant 
to describe the nature of the rows, is called the stub of the table. 

(iii) Caption. The upper part of the table which gives a 
description of the various columns is the caption of the table. The 
caption may include a mention of the units of measurement for the 
data of each column and also column numbers, like (1), (2), etc., 
that may be quoted in any future reference. The title, stub and 
caption, taken together, are said to form the box head of the table. 

(iv) Body. The body is the principal part of the table, where 
all the relevant figures are exhibited. ; 

(v) Footnotes. Most tables also have footnotes indicating the 
sources of the data, which may also include explanations regarding 
the scope and reliability of particular items. A footnote to a table 
showing the census population of India at some consecutive counts 
may, for instance, show that the figure for a particular year is 
provisional or only a rough estimate, while the figure for another 
year relates to undivided India. 


While no hard and fast rules may be laid down, some broad 
guidelines may be given for drawing upa table. First, the table 
should be well balanced in length and breadth, In case a single 
table may become too long or too wide, one should consider whether 
the data may be divided into two or more tables. Second, as in 
textual presentation, here too the arrangement of items should 
follow a logical sequence. For instance, when time-series data 
are to be presented, they should be arranged chronologically. 
Third, in order to facilitate comparison between important items, 
the figures to be compared should be placed 'as close to each other 
as possible. It may also be pointed out tbat column-wise 
comparisons are more easily made than row-wise comparisons. 


166 FUNDAMENTALS OF STATISTIOS 


To illustrate the above points, the data presented earlier in a 
textual form are also shown in Table 5.1. 
TABLE 5.1 


VALUE or Imeorrs INTO AND OF EXPORTS FROM INDIA DURING 
APRIL-NOVEMBER, 1955, AND APRIL-NOoveMBER, 1956 


Value in crores of rupees 


Item ee a Ea 
Apdl--Nevember, | se Shrereread@bupess 
w (2) (3) * 
IMPORTS 
Machinery- 735 +323 
Iron and steel 343 | + 53°7 
Other metals 16'4 + 98 
Other imports 293-8 + 21:2 
Total imports — 418-0 | 535-0 +1170 
EXPORTS 
Oil 262 13-4 —12'8 
Cotton 24°6 10-9 —13:7 
Manganese ore 10:6 68 ras w 
Cotton textiles 42-2 40-1 f — 21 
Jute manufactures 83:3 79:5 — 38 
Other exports 201:5 227 3 +258 
Total experts 3884 378:0 —10°4 


Source: “White Paper on General Budget”, Budget for 1957-58, Ministry of 
Finance, Governmen: of India. 


5.6 Diagrammatic representation of data 

Representation of statistical data by diagrams—by graphs, charts 
or pictures—is more effective than tabular representation, being 
easily intelligible to a layman. Indeed, diagrams are almost essential 
whenever it is required to convey any statistical information to the 
general public. It must be stated, however, that information on a 
limited number of topics only can be presented in a single diagram 
so as to maintain its neatness. Moreover, a diagram can give only 
a rough idea about the magnitude of variation, whereas in a table 
the exact values may be quoted. : 


COLLECTION AND PRESENTATION OF DATA 167 


The more important types of diagram which are used in statistical 
work are being described below. 

Line diagrams : -Consider the data of Table 5.2, which show how 
the number of tourists coming to India has been changing over time. 
A very convenient method of representing such data is to use @ 
line diagram. 

TABLE 5.2 
Num Ber or Tourists Visi1ixe Inpia (EXOLUDING INDIAN 
Nationats ABROAD) DURING 1951—1957 


Year | Number of tourists (000) 
pone Velen eS 

1951 19°9 

1952 25°4 

1953 28:1 

1954 39-2 

1955 43°6 4 

1956 68:9 

1957 80:5 


Source: Statistical Year Buok, 1952, United Nations. 


We take the year along the horizontal axis and the number of 
tourists along the vertical. The numbers for the seven years give 
seven points on the graph, which are next joined by line segments. 
The resulting line diagram is shown in Fig. 5.1. 

It may be noted that the line diagram would be exactly a straight 
line in case the values increased or decreased at a constant rate. 

If we want to compare the volumes of tourist traffic for a number 
of countries, say India, the U.K. and the U.S.A., we may draw 
a line diagram for each country on the same graph paper. To 
distinguish the three diagrams, we may use, say, continuous lines for 
India, broken lines for the U.K. and dotted lines for the U.S.A. 

A variant of the line diagram is the semi-logarithmic chart Or 
ratio chart, where the vertical scale is logarithmic, but the horizontal 
scale is of the usual arithmetic type. Since the vertical scale is 
logarithmic, the distance between any two values on this scale is 
proportional to the difference of their logarithms. Obviously, if the 
variable under enquiry is increasing or decreasing at a constant ratio, 
then the ratio chart will be exactly a straight line. The data of 
Table 5.2 are represented in a ratio chart in Fig. 5.2. A ratio chart 


168 FUNDAMENTALS OF STATISTIOS 


NUMBER OF TOURISTS (000) 


` 195) 1952 1953 1954 1955 1956 1957 


Fig. 5.1 Line diagram showing the number of tourists 
coming to India from abroad during 1951—1957. 


NUMBER OF TOURISTS (0001 


1951 1952 1953 1954 1955 1956 1957 


Fig. 5.2 Ratio chart showing the ber of tourists 
coming to India from abroad during 1951-1 957. T 


is to be used when the diagram is to show the relative changes in 
the variable rather than the absolute magnitudes of change. 
To compare the relative growth or decline of two or more series, 


COLLECTION AND! PRESENTATION OF DATA 169 


too, one may draw the ratio charts for the series and compare the 
slopes of the charts. -A ratio chart is also useful for representing a 
time series with widely different values. 

In drawing a line diagram, the following points should be borne 
in mind : 

First, none of the axes should be too long or too short in com- 
parison to the other, for fluctuations may seem over-emphasised in 
the first case and almost ironed out in the second. 

Secondly, the zero of the vertical scale (but not necessarily that 
of the horizontal) should be included in the diagram ; otherwise, the 
diagram may give a false impression about the magnitude of fluctua- 
tions. However, if this is done in case the zero is too far below the 
actual range of variation, then the graph will lie at the top of the 
diagram. Here the actual variation may be brought into focus and 
the diagram made more agreeable to the eye by showing a definite 
break in the vertical axis, as in Fig. 5.4. 

If the data on the variable under enquiry, varying over time, are 
given for each of a number of components, then we may use a type 


200 


D 

w 

& 

o N 

« 

u CEILL L LEE L EL LLL LL LLL LLL ay 

4 CLELELLELELLLELELL LEE LLL LEE EE 

2 100 LL heheheh hhheiebeeitheceescike 

w ttikLlllllttLLiLhLLLttlLtLLtELLLLLL 
iett CC irll OLLUT ELL LAL LLL 

i CLEE CAERE LEOLUCA RE EELA 

z LEELA CCL er teL iaae GLEE L Amy 

2 Ce hhh heehee ehelkieckleel[e[e % 

$ Uleti E LE LLL LLL LL LEL LLL LLL L LE 

$ Ob beh eh heh CLLELELELELELL LLL 


LEE EL LLL LELLELLLELEL ELLE LLL 
CLELCLELLLEEELELEL ELLE LLL L 


Wesp281 1981252 1952-53 1953-54 1954-55 1955-56" 1956-57 
* REVISED ESTIMATE > + BUDGET ESTIMATE 
Fig. 5.3 Net expenditure on defence services, Govt. of India. 
(Source: The Central Budget in Brief, 1956-57, Govt. of India ) 
of line diagram called component part chart or band chart. It is actually 
a number of line diagrams, one for each component, superimposed 


one upon another, This is illustrated in Fig. 5.3, which represente 


170 FUNDAMENTALS OF STATISTICS 


the expenditure on the defence services of India, separately for the 
Army, Navy and Air Force, for each year from 1950-51 to 1956-57. 
To distinguish the three parts of the diagram, three different shades 
have been used. Here the height of each band gives the expenditure 
on the corresponding defence service for different years, while the 
total height represents the total expenditure. 


TABLE 5.3 
ARFA UNDER WHEAT w TH IRWGATION FACILITIES AND YIELD 
Rate or Warar (Ixpia, 1947-48 ro 1954-55) 


Year inrigated t ge 
1947-48 6,882 599 
194849 7,420 566 
1949-50 7,618 584 
1950-51 8,407 592 
1951-52 8,506 582 
1952-53 9,123 681 
1953-54 9,601 670 
1954-55 9,810 717 


Source: Statistical Abstract, India, 1956-57, C.S.O., Govt. of India. 


Another type of line diagram is the multiple-axis chart, which is 
meant to show the relationship between two or more series of 
data. In Table 5,3, the part of the area under wheat coming under 
irrigation and the yield-rate of wheat are given for India for a 
number of years. To show how the twò series are related, we may 
draw a chart of this kind. For this purpose, we have to construct 
two line diagrams, one for each series, with a common horizontal 
axis but different vertical axes (Fig. 5.4). 

Bar diagrams: Another mode of diagrammatic representation of 
data is the use of bar diagrams. These have more general applicability 
than line diagrams in the sense that they may be used for series 
varying cither over time or over space. In this method, bars of equal 
width are taken for the different items of the series, the length of a bar 


COLLECTION AND PRESENTATION OF DATA 171 


representing the value of the variable concerned. It is preferable to 
take the bars horizontally for data varying over space and vertically 


10000 


AREA UNDER WHEAT IRRIGATED (COO ACRES) 


YIELD RATE OF WHEAT (Lb. PER ACRE) 


PREG 


1947-48 1948-49 1949-50 1950-51 1951-52 1952-53 1953-54 1954-55 


Fig. 5.4 Relationship between irrigation and yield-rate of 
wheat (data for India, 1947-48 to 1954-55). 
in the case of a series varying over time. In Fig. 5.5a bar diagram 
is shown, which represents the data of Table 5.2. Another bar 
diagram is given in Fig. 5.6. 
A variant of the bar diagram is the multiple bar diagram, which is 


NUMBER OF TOURISTS (020) 


FUNDAMENTALS OF STATISTICS 


a 
1954 1952 1953 1954 1955 1956 


Fig. 55 Bar diagram indicating the growing number of 
tourists coming to India from abroad (Table 5.2). 


PRODUCTION (000 METRIC TONS) 
100 200 


Fig. 5.6 Bar diagram showing predaction of tea in the 
principal tea-growing countries 1956. 


COLLECTION AND PRESENTATION OF DATA 173 


TABLE 5.4 
RuraL anp Urpan Porvutation or West BENOAL 
ACCORDING To THe Censuses oF 1921—1951 


! Population (in millions) 


Pape reat | Rural | Urban 
2:432 

2897 ' 
1941 17:197 ~ 4679 
1951 20:025 D 6282 


Source: Statistical Abstract, West Bengal, 1957, Govt. of West Bengal. 
employed in comparing two or more series of data on the same 
variable. Thus we may want to compare the population figures, as 
recorded in a number of censuses, for twoor more countries. Or we 


Ae pi 
e oi w 
A . 4 


29 


POPULATION (IN MILLIONS) 
3 


Ow 


1921 1931 a T 1951 


Rural GD, Urban 


Fig. 5.7 Rural and urban population of West Bengal, 
according to census figures. 


may like to compare the yield of paddy for a number of States for two 
or more time periods. Data of a similar type appear in Table 5.4. 


174 FUNDAMENTALS OF STATISTIOS 


Here we are primarily interested in knowing how the rural popu- 
lation of West Bengal compared with the urban population in each of 
the decennial censuses. To show this diagrammatically, it is appro- 
priate to use four sets of bars (for the four census years), each set 
containing two bars for the two sections of the population (Fig. 5.7). 

Pictorial diagrams : The most vivid way of presenting numerical 
data is by using some pictorial device. Here a suitable symbol is 
first chosen to represent a certain number of units of the variable. 
Next, each value in the given series of data is represented either by 


-Htt 
-ttti 
-tttttti 
-ttttittt 


EACH SYMBOL REPRESENTS 10 000 TOURISTS 


Fig. 5.8 Pictorial diagram showirg number of tourists 
coming to India from abroad (Table 5.2). 


taking a similar symbol, its size heing propotional to tke value, or bv 
taking a number of symbols (including a fraction of a symbol) of the 
same size The second method. is preferable, because here an idea 
about the actual values is more readily obtained by looking at the 
diagram, Some of the data of Table 5.2, showing India’s growing 


COLLECTION AND PRESENTATION OF DATA 175 


TABLE 5.5 
INVESTMENT ON DIFFERENT HEADS UNDER 
tue Tard Five Year Pran 


| Proposed investment 


item | (Rs. Crores) 

1. Agriculture, minor irrigation and community 

development 1,475 
2. Major and medium irrigation 640 
3. Power l 975 
4. Village and small industries 435 
5. Industries and minerals 2,500 
‘6. Transport and communication 1,650 
7. Social services 1,725 
8. Inventories 800 

Total 10,200 


eh Ste ae ee ee a 
Source: Third Five Year Plan—Draft Outline, Planning Commission, Govt. 
of India. 


TABLE 5.6 
PERCENTAGES IN THE DIFFERENT CATEGORIES OF TABLE 5.5 
AND EQUIVALENT ANGLES TO BE UsrD IN A Pin DIAGRAM 


Seats Percentage Angle to be used 
Ttem investment (degrees) 


i. Aaaa T Oaa s2 
2. Major and medium irrigation 6'3 22:7 
3. Power 95 342 
4. Village and small industries 43 15:5 
5. Industries and minerals 245 88:2 
6. Transport and communication 16:2 58-3 
7. Social services A 16:9 60:8 
8. Inventories 78 28-1 


Total 100°0 300 


176 FUNDAMENTALS OF STATISTIOS 


attractions to tourists from abroad, are presented in this manner 
in Fig. 5.8. 

Representation of percentages: When the values of a variable are 
given for a number of categories, we may be interested in the percen- 
tages for the different categories rather than in the absolute values, 


INVENTORIES 


SOCIAL SERVICES 


TRANSPORT & 
COMMUNICATIONS 


INDUSTRIES & 
MINERALS 


VILLAGE & SMALL 
INDUSTRIES 


POWER 


MAJOR & MEDIUM 
IRRIGATION 


AGRICULTURE, 
MINOR IRRIGATION 
& COMMUNITY 
DEVELOPMENT 


Fig. 5.9 Divided-bar diagram showing proposed investment 
on different heads under the Third Five Year Plan. 


for the percentages are expectcd to give a better idea of the relative 
importance of each class. For this purpose, one may draw either a 
divided-bar diagram or a pie diagram. In the first case, a bar of suitable 
length and width is taken, its total area being regarded as 100. If 
a vertical bar is chosen, then this area is divided into a number of 
sections by drawing lines parallel to the base, in such a way that the 
area of each section represents the percentage for the corresponding 
category. In the second case, a circle is used, the area enclosed by 


COLLECTION AND PRESENTATION OF DATA 177 


it being taken as 100, It is then divided into a number of sectors by 
drawing angles at the centre, the area of each sector representing the 
corresponding percentage. Since the full angle at the centre-is of 360°, 
it is clear that for any particular category the angle (in degrees) 
should be 3:6 times the corresponding percentage. The data on 
the total investment under the Draft Third Five Year Plan, given in 
Table 5.5, are represented by both these methods in Figs. 5.9 and 5.10. 


= 
S55 
y 
i 
vs 


"145% 


GOCO 


INVENTORIES 


72% 


UZO 000000 
9 $0000000000000 00000 
RSO TRANSPORT & 0900 
00 COMMUNICATIONS 


SOCIAL SERVICES 


Fig. 5.10 Pie diagram showing proposed investment on 
different beads se, the Third Five Year Plan. 


In order to draw a divided-bar diagram or a pie diagram, it 
is convenient to form beforehand a table of percentages (and for 
a pie diagram, also the corresponding angles to be drawn at the 
centre of the circle), These are shown in Table 5.6. 

Statistical maps: Statistical information is sometimes presented 
in maps. This is appropriate when our concern is to show how a vasti- 
able changes from one part of a region to another. Such a situation 
arises, for instance, when it is necessary to show diagrammatically 


¥a(1)—12 


178 FUNDAMENTAIS OF STATISTICS 


the variation in rainfall, in density of population or in yield-rate of a 
crop over the different parts of India. For this purpose, an outline 
map of the region is taken, and then the different parts of the 
map are shaded, a deeper shade indicating a larger value of the 


a ; ] 

| =< A tata ite 
INDIA 

DISTRIBUTION OF POPULATION 

BY STATES AND UNION TERRITORIES 

1971 CENSUS 


so o 00 MILES. 
wae | 
wos. 0 200 300 400 500 KILOMETRES 
Cot 

2 ip 

ARUBACHAR 

f panacee” ) 
fy onuran e r 


as 


iay aaa E | 
PRETERE: aA ani eil 


ARABIAN 
Sta 


Fig. 5.11 A statistical map showing distribution of popula- 
tion by States in India, according to the 1971 census. 


variable. Alternatively, points may be used to represent the value 
of the variable for each part of the region, the higher the value 
of the variable the greater being the density of points. A statistical 
map of this type is given in Pig. 5.11. 


OOLLEOTION AND PRESENTATION OF DATA 179 


Questions and exercises 


5:1 Describe the different methods of collection of numerica) 
data and state their relative merits and defects. 

5.2 Distinguish between frequency data and non-frequency 
data. Mention the important points that are to be remembered in 
the tabular representation of non-frequency data. 

5.3 Give an account of the different modes of diagrammatic 
representation of statistical data. 

5.4 What is a ratio chart? Discuss its uses and point out its 
advantages over a simple line diagram: 

5.5 Scrutinise the following data and state sachs of ki figures, 
if any, you consider unreliable. Give reasons for your answer. 


| Population of | Population of 
Census year | country 4 (000) | Census year country A (000) 
1900 212 1940 325 
1910 235 1950 378 
1920 274 1960 310 
1930 510 a 


5.6 Examine the three series of figures given in the following table 
and state whether they may be regarded as mutually compatible. 


Total yield Yield-rate 


Area under crop 
(000 tons) (Ib. per acre) 


(000 acres) 


74,424 728'63 
9,325 2,944 70779 
26,842 8,539 7125-9 
7,999 2,786 78017 
43,456 9,092 468-65 
27,350 3,555 49116 


1 ton=2,240 Ib. 

5.7 Represent the information contained in the following 
passage in a suitable tabular form : 

“The cropped area of vegetables (excluding potatoes) grown for 
human consumption in the United Kingdom rose in 1955-56 and was 
the highest since 1950-51. The cropped area increased to 509,000 
acres, some 11,000 acres more than in 1954-55. The area of root 
vegetables increased by 8,100 acres to 62,400 acres, carrots alone 


180 FUNDAMENTALS OF STATISTIOS 


increasing by 5,700 acres to 33,200 acres. “ The area of cabbage rose 
slightly, thus halting the steady decline since 1947-48 ; the cropped 
area was 75,700 acres with 74,800 acres in 1954-55. The cropped 
area of cauliflower and broccoli was 33,400 acres, 2,400 acres less 
than in 1954-55. Peas harvested dry decreased by about 9,800 acres 
to 121,800 acres, but a larger area of beans, mainly broad beans and 
green peas, was grown. The area of broad. beans increased by 
2,600 acres to 7,300 acres and the area of green peas for canning 
and quick freezing rose by 7,000 acres to 50,400 acres.” 

5.8 Use a suitable diagram to represent the following data 
relating to the Post and Telegraphs Department, Govt. of India 
(taken from Statistical Abstract of the Indian Union, 1966 and 1968) : 


e | Net recei i 
pts Net receipts 
Year | (lakhs of rupees) | Year (lakhs of rupees) 
1955-56 565°32 1961-62 497°56 
1956-57 880-33 1962-63 567-12 
1957-58 645-03 1963-64 963-80 
1958-59 954-09 1964-65 871°33 
1959-60 859-09 1965-66 516-17 
1960-61 425:34 1966-67 936-09 


m 
5.9 The actual outlay on the public sector in the First and Third 
Five-Year Plans of India is shown below by head of development : 


|| First Plan Outlay | Third Plan Outlay 


Head of development | Rs. Crores Rs. Crores 
ee n_i 
| 

Agricultural and community development 290 1,096 
Irrigation and power | 583 1,927 
Industries and mining | 97 1,965 
Transport and communications | 518 2,113 
Social services 412 1,422 
Miscellaneous | 60 85 

Total | 1,960 | 8,608 


COLLECTION AND PRESENTATION OF DATA 181 


Draw suitable diagrams to show the relative importance attached 
to the various heads in each Plan. Hence make a comparison 
between the First and the Third Plan. 


SUGGESTED READING 


[1] Allen, R.G.D. Statistics for Economists (Chs 1, 3). Hutchinson 


[2] 
[3] 
[4] 
[5] 
[6] 
[7] 
[8] 


[9] 
[10] 


Univ. Library, 1951. 

Croxton, F. E. and Cowden, D. J. Applied General Statistics 
(Chs. 1—6). Prentice-Hall, 1964, and Prentice-Hall India. 
Fisher, R.A. Statistical Methods for Research Workers (Chs. 1—2). 
Oliver & Boyd, 1954. 

Jenkinson, B. L. Bureau of the Census Manual of Tabular 
Presentation. U.S. Government Printing Office, 1949. 

Mills, F.C. Statistical Methods (Chs. 1—3). H. Holt, 1955. 
Modley, R. How to Use Pictorial Statistics. Harper, 1937. 
Moroney, M.J. Facts from Figures. Penguin, 1956. 

Myers, J. W. Statistical Presentation. Littlefield, Adam & Co., 
1956. 

Tippett, L.H.C. Statistics. Oxford University Press, 1943. 
Wallis, W. A. and Roberts, H. V. Statistics: a New Approach 
(Chs. 1—3, 5). Methuen, 1957. 


6 _ FREQUENCY 
DISTRIBUTIONS 


6.1 Summarisation of data 

In Chapter 5 we have discussed some elementary methods of 
dealing with numerical data in the detailed form in which they are 
collected. Often, however, the raw data will be so numerous that 
their significance will not be readily comprehended. In such cases it 
will become necessary to summarise the data to an easily manageable 
form. This kind of summarisation, of course, leads to some sacrifice 
of information, But it will not be a serious drawback unless we are 
interested in the individual (country, person or object) to which each 
figure refers. This is the case, for instance, when we are ultimately 
interested in information regarding such points as the minimum or 
the maximum height for a group of students, the percentage of 
students having height between 160 cm. and 166 cm., etc., and not 
in the height of each and every student of the group. And, as we 
have emphasized in Section 5.2, in statistics we are concerned with 
such properties of groups or aggregates, rather than with the 
properties of the individuals forming them. 

For the present, it will be assumed that the order in which the 
data are obtained is not relevant to the problem under enquiry. 
* Some cases where the order of the data is important will be discussed 
in Volume 2. 


6.2 Attribute and variable 

It should be noted that although statistics always deals with 
numerical data, such data may arise in one of two ways. In some 
cases the data are numerical to start with, e.g. when we record the 
height for each of a group of men cr the number of rooms in each 
house of a town. In other cases numbers arise only secondarily. 
When we record the sex of each newborn baby during a month or 
the language of each book in a library, the data are not numbers 
initially, We get numbers if, subsequently, we note the number of 
male babies and that of female babies, or the number of books 
written in English, the number written in Hindi, the number written 
in Bengali and so forth. 


FREQUENOY DISTRIBUTIONS 183 


We may, therefore, say that the first type of data arise if we are 
observing, for each individual of a group, a character which can be 
expressed in numbers. Such a character will be referred to as a 
quantitative character or a variable or a variate. For the second type 
of data, the character observed (viz the sex of a baby or the language 
in which a book is written) is not expressible in numerical terms. 
Such a character is, therefore, called a qualitative character or an 
attribute. 

The distinction betwéen a variable and an attribute should be 
clearly borne in mind, for they will generally require different 
methods of statistical treatment. 


6.3 Frequency distribution of an attribute 

In the course of an investigation conducted by the Indian Market 
Research Bureau in 1973, 1,674 inhabitants of Calcutta, Bombay 
and Madras were interviewed. Each was asked, among other 
questions, whether he/she knew about the Indian Airlines 
employees’ agitation of that time. On getting the data, the sponsors 
of the investigation put them into a systematic form. They just 
counted the number of those who knew about the agitation among 
the people interviewed and got the following table : 


TABLE 6.1 
REsuLT or SURVEY ON AIRLINES EMPLOYEES’ AGITATION 


Number of people (frequency) 


State of knowledge 


al Some Suge fonts Wap Of Lat pisses ae 
Aware p 619 
Unaware 1,055 
a TE 
Total | 1,674 


The number 619 shows how many of the people interviewed 
were aware of the agitation. In statistical language, this is the 
frequency of the form ‘aware’ of the attribute, say, ‘state of knowledge’, 
because it tells us how frequent this form was among the people 
interviewed. Similarly, the number 1,055 is the frequency of the 


form ‘unaware’. 


184 FUNDAMENTALS OF STATISTICS 


Perhaps a better picture is obtained if one uses, instead of the 
frequencies, the proportions (or the relative frequencies, as they are 
called). These are shown in Table 6.2. 


TABLE 6.2 
Proportion OF PEOPLE AWARE OF 
AIRLINES EMPLOYEES’ ÅGITATION 


State of knowledge Relative frequency 


Table 6.1 shows how the total frequency 1,674 is distributed over 
the two classes, ‘aware’ and ‘unaware’. Such a table is, therefore, said 
to give a frequency distribution—in this case, the frequency distribution 
ofan attribute that may be called ‘state of knowledge’. Table 6.2 
presents the same frequency distribution in a different form. 

Tables 6.1 and 6.2 present a dichotomy—a classificgtion of 
individuals into two classes. We may as well have frequency 
distributions of attributes with more than two classes.. For instance, 
in the same survey, again, the people who knew of the agitation 
were asked whether they were sympathetic to the agitation or not. 
Their answers led to the following frequency distribution with three 

_ Classes : : 
TABLE 6.3 
ATTITUDE TOWARDS ÅIRLINE3 EMPLOYEES’ AGITATION 


Attitude 


Number of people (frequency) 


FREQUENOY DISTRIBUTIONS 185 


Data regarding attributes may be represented graphically on the 
basis of either the frequencies or the relative frequencies. Since the 
data shown in the form of frequencies, as in Table 6.1 or Table 6.3, 
are similar to those in Table 5.5, they may be represented by means 
of a bar diagram, preferably with horizontal bars. Similarly, the 
data given in the form of relative frequencies, as in Table 6.2, being 
comparable to those in Table 5.6, may be shown in a pie diagram 
or a divided-bar diagram. 


6.4 Discrete and continuous variables 

When we pass on to the study of data regarding quantitative 
characters, it is immediately found that these may be of two principal 
types. In the first place, the character may take only some isolated 
values, like the number of letters in a word (word-length), number of 
petals in a flower, number of members in a family (family-size) and 
so forth. Alternatively, it may conceivably take any value within its 
range of variation. The height, weight or age of a man, the diameter 
of a bobbin, the temperature, rainfall or humidity in a region, etc., 
are variables of this type. Even in the second case, the actual 
measurements will present a discreteness, e.g. when heights are given 
correct to the nearest cm. or when weights are given correct to the 
nearest kg. But this discreteness, it should be noted, is completely 
artificial, being due to the limitations of the measuring instrument. 
Variables of the first type are called discontinuous or discrete, while 
those of the second type are called continuous. j 

The distinction between a discrete variable and a continuous 
variable is again to be borne in mind, because the statistical treat- 
ment of the one may differ in some cases from that of the other. 


6.5 Frequency distribution of a variable 

Going to market one winter morning (in 1961), one of the authors 
bought, among other things, peas worth 25 paise. Back home from 
the market, he found there were 198 pea-pods in his bag. He took 
each pod and counted the number of peas it contained, The figures 
thus obtained are given below : 


186 FUNDAMENTALS OF STATISTICS 


TABLE 6.4 

Noumurr or Pras ix Eason or 198 Pra-rops 
ol) Soe Ss ee eA ae DLANE: 
Sch Sor Boe E E S 
AD Gt rest! SE iB E T MY AES TH AS TE N 
BPI ath GAME Sg g pte Peed 1 ght gaga sath 16 
1. Ree anes 4g 8 A. ee oe 
V6) Seca 9) Geltig) wig serterel sents Gute Sa AAE- 
CONG eee Mera Gero ena, SOT gene ge 
i ee ee Ca a ek ee 8k S 
Grit BED Gir Fy Sk cM egy Fee eine By: 8 
SB AUE E edt BO) SSHAIGU Me Naoto des 4 4 
Samo A BN ke ee ge GS 
Deckers i Sa ty Vole aa ai 21-3 
ae A SO SS S A BO 4901S ca VB 8 
rs ate idee ie a WR ale Ma, Gaah a SO RR 
2 -AR 


The significance of this mass of data cannot be easily compre- 
hended, A need is immediately felt to summarise the data to a more 
comprehensible form. The first step in the process of summaris ation 
is to count the number of pods with 1 pea, with 2 peas, with 3 peas, 
etc., and thus form a frequency distribution of the discrete variable 
‘number of peas per pod’. The labour involved in counting can be 
minimised if we adopt the following procedure : 

On going through the whole set of figures, we find that the 
largest value is 7 and the smallest 1, We, therefore, form a table 
with seven classes for the seven values : 1, 2, ......, 7. Next, we take 
the given values of the variable one by one and for each value place 
a stroke (a tally mark) in the table against the appropriate class. 
To facilitate counting the tally marks are arranged in blocks of five, 
every fifth stroke being drawn across the preceding four. This is 
done in Table 6.5. 


FREQUENCY DISTRIBUTIONS , 187- 


i TABLE 6.5 
TALLY Marks yor THE VALUES IN Tarim 6.4 


NUMBER 


OF PEAS * TALLY MARKS Mh 


LL LL 


Finally, we count the number of tally marks in each class and 
get the frequency distribution of the variable, which is shown in 
the first two columns of Table 6.6. 


TABLE 6.6 
Fraquency Disrripution or NUMBER OF Psas 
per Pop ror 198 Pops 


j 


Number of peas | Frequency Relative frequency 

1 4 00202 

2 33 0-1667 

3 76 0:3858 

4 50 02525 

Pa 26 0:1313. 

6 8 0:0404 

f 7 1 00051 

es es) ee eee 

| 

j 198 1:0000 


Total 


The same frequency distribution may be represented in a number 
of other ways. In the first place, one may show relative frequencies 
instead of frequencies, as in the case of an attribute. These will give 


188 : FUNDAMENTALS OF STATISTIOS 


the proportion of pods with k peas, for different values of k. But 
suppose one asks : “What is the number of pods with k peas or less?” 
To answer such questions, we may form a table giving cumulative 
totals of the frequencies proceeding from the lowest class upwards, 
called cumulative frequencies of ‘less-than type’. Similarly, the cumula- 
tive totals of the frequencies obtained by proceeding from the highest 
class of the table downwards are called cumulative frequencies of 
‘more-than type’. These cumulative frequencies give the number of 
pods with k peas or more, for different values of k. 


TABLE 6.7 
Comutative Frequency TABLE ror THE FREQUENOY 
DistrisuTion OF NUMBER or Pgas ror 198 Pops 


Cumulative frequency Cumulative uency 
Number of peas (less-than type) | P enea: w 


An alternative method of representing a frequency distribution 
is to show the cumulative proportions, which are the cumulative totals of 
relative frequencies. In the present case, they would give, for 
different values of k, the proportions of pods with k peas or less and 
with k pea wor more. 

When we are concerned with an attribute or a discrete variable, 
generally the nature of the data will itself dictate the mode of classi- 
fication to be adopted. It would be natural to take one class for each 
form of an attribute or for each different value of a discrete variable. 
In the case of a continuous variable, on the other hand, if one takes 
a class for each different value of the variable, the number of classes 
may be unduly large, thus defeating the very purpose of classification, 


FREQUENOY DISTRIBUTIONS 189 


viz. the summarisation of data. In fact, since a continuous variable 
can, by definition, assume an infinite number of possible values, the 
classification of such data is necessarily artificial. The statistician 
himself will have to decide upon the appropriate classification to be 
adopted in any given case, Some general observations can, however, 
be made in this connection. Let us take for illustration the data of 
Table 6.8. 


TABLE 6.8 
STATURE (IN om.) oF 177 InpIan ADULT MALES 


169°0 1645 1542 163°0 171-6 1575 
166:7 166:7 161:8 163-1 167-0 160:5 
159-9 165-0 156:5 157°6 169°5 170:3 
157:8 159:7 161-7 160-9 163:7 167:0 
169-9 158:9 145°6 152°5 ~ 162:5 171°3 
158°4 168:9 162:2 1679 166'8 162:2 
1717 1639 162:0 1642 160-2 169-4 
160:4 162:0 165:5 167:5 163-9 1700 
167-5 165-2 167-2 1642 171-0 166-6 
161-0 167:4 159-8 171-7 156°4 160°5 
168-8 1728 169-0 167°7 170:0 160°6 
167°8 168-2 1653... 1680 161-4 168-0 
1640 166°8 158-0 168-1 160:5 1559 
167:4 163:5 1695 — 164-4 173°2 1620 
167°8 159°3 169°2 165-2 174-2 161-4 
165°2 163-1 161:5 163:5 ice! | a ad ee a 4 
163°5 168°9 166:7 1764 16174 156°0 
170°4 166°3 162°6 160°9 165:4 163-2 
159°0 1645 1713 1642 160:0 172:0 
158-1 1621 166:7 161-0 156:9 152-6- 
157:6 182:0 168-0 158:4 1649 167:1 
159'2 158:5 160:8 1714 167:3 161°3 
167:7 183:5 168°0 159°5 159°5 1701, 
170:2 163'5 156°0 1627 1652 1587 
169:0 1701 169-0 160'5 160:8 167:0 
157:5 167:7 172:5 1717 170-1 1 1784 
161-5 1574 171-1 163:7 1654 165:5 
165°8 1649 168°2 162°3 168:1 

159-6 168:3 172°6 1719 168°7 

160°3 1640 169°3 1697 165°4 


190 FUNDAMENTALS OF STATISTIOS 


For one thing, the classes should be» exhaustive, i.e. should be 
such that each of the given values is included in one of the classes. 

Secondly, the classes should be mutually exclusive ornon- 
overlapping. If the classes are mutually -overlapping, e.g. 144-5— 

149:5, 149-5—-154-5, etc., in the present case, difficulty will arise in 
classifying a value like 149-5. 

Thirdly, the number of classes should not be too large ; otherwise, 
the purpose of classification, viz. summarisation of data, will not be 
served. Moreover, by taking a large number of classes one will 
‘introduce an irregular pattern in the frequencies which may be 
completely absent in the actual distribution, (This applies to the 
case of an attribute or a discrete variable as well, Ifthe number of 
different forms of an attribute cr the number of different values of a 
discrete variable be too large, then each class should be constituted 
with a number of different forms or a number of different values.) 

The number of classes should not be too small either, for this also 
may obscure the true nature of the distribution, Further, we shall 
see in later chapters that in computing various statistical measures, 
like mean, standard deviation, etc., from a frequency distribution, 
the assumption is made that each class-frequency corresponds to a 
single value, viz. the mid-point of the set of values defining a class. 
If the number of classes be too small, each class will be too wide, 
and this assumption will make the computed value of the measure 
extremely unreliable. 

~ Asa working rule, one should take ten to twenty classes, provided 
the total frequency is not small, say not less than 1,000. -A still 
smaller number of classes may be taken if the total frequency be 
much smaller than 1,000. 

Lastly, the classes should preferably be of equal width. Other- 
wise, the class-frequencies will not be comparable, and the 
computation of statistical measures will be laborious. The principle, 
however, cannot be rigidly followed, as will be apparent from the 
frequency distribution of Table 6.11, In forming Table 6.11, if 
we took classes of the same width, the number of classes would be 
exceedingly large (if the width were about 10 rupees, e.g.) or 
exceedingly small (if the width were about 300 rupees, e.g.). In the 
latter case certain important features of the data would be obscured. 


FREQUENOY DISTRIBUTIONS 191 


It would not be apparent, for example, that more than 25%- of 
the employees earn less than Rs, 177-5 or that about 60% of the 
employees earn less than Rs. 197-5. é 

Bearing the above points in mind, we may take for bata the 
data of Table 5.8 the 8 classes defined by the following values of the 
variable: 144:6+149'5,  149-6—154-5, ......, 1796—1845. Here 
again it would be convenient to form a table like Table 6.5. As 
before, we go through the given values of the variable one by one 
and for each value put a stroke opposite the class to which the value 
belongs (Table 6.9). 


TABLE 6.9 
TALLY Marks ror THE Dara or TABLE 6.8 


i es 


_ “omy TALLY MARKS 

144°6 — 149°5 4 

149:6 -154:5 | /// 

124:6 -159:5 | AM DY INI IN lll 

139-6 -164:5 | W AU DM DA N NE DM DA NI TA RL ili 
roa -o~ 160:5 | AWAN IU ANLARI INI TN AU ON NEIN INT 


ANIANI IRI TL IN I 
I 
// 


169+ = 174:5 


174-6-179°S 


179-6 ~ 164-5 


One more point is to be noted before we draw up the frequency” 
distribution on the basis of Table 6.9. Consider the class 1446— 
149:5. ‘The values are recorded correct to pth of a cm. Hence 144 6 
represents any value between 144-55 and 14465. Similarly, 149:5 
represents any value between 149-45 and 14955. Thus the class 
1446—149:5 really stands for the class-interval 144:55—149'55. 
Similar is the case for the other classes. 144-6 and 149:5 are called 
the lower and upper class-limits for the first class, while 14455 and 
149-55 are the corresponding class-boundaries. One should state the 


192 FUNDAMENTALS OF STATISTIOS 


class-boundaries, rather than the class-limits, while drawing up: 
the frequency distribution of a continuous variable. Table 6.10 
shows the frequency distribution in terms of the absolute frequencies, 
relative frequencies and cumulative frequencies. It should be noted 
that the cumulative frequencies of the less-than type correspond 
to the upper class-boundaries ; for instance, the third one, 28, is the 
number of persons with height 159-55 cm. or less. Similarly, the 
cumulative frequencies of the greater-than type correspond to the 
lower class-boundaries. 


TABLE 6.10 , 


Frequency DISTRIBUTION OF HEIGHT FOR 177 
Ispan Aputt MALES 


Cumulative frequency 


i | Feaoener | gaen 
‘Greater-than’ 

144:55—149°55 1 0-0057 1 177 
14955—15455 3 0-0169 4 176 
15+55—159:55 -24 0:1356 28 173 
159:55—16455 58 0:3277 86 149 
164°55—169°55 60 0-3390 146 91 
169°55—174-55 27 01525 | 173 31 
174:55—179°55 2 0-0113 175 4 
179:55—184:55 2 0:0113 | 177 2 


Total | Uir | 1-0000 | — | — 


If the classes be of varying width, then the different class-frequ- 
encies will not be comparable. Comparable figures can be obtained 
if the frequencies are expressed per unit value of the variable by 
dividing them by the widths of the class-intervals. The ratios are 
called frequency-densities. Table 6.11 gives a frequency distribution 
where the classes are not equally wide. The frequency-densities 
appear in the third column of the table. 


FREQUENCY DISTRIBUTIONS 193 


TABLE 6.11 
Frequrnoy DISTRIBUTION or Montaty INCOME FOR 1,870 
EMPLOYEES oF A Manoracturine FIRM 


Tocome (Rt) | requeney |- Frequeney-density 
142:5— 1475 8 1-600 
147°5— 1525 12 2400 
152°5— 157:5 47 9-400 
157°5— 162:5 6l 12-200 
162°5— 167-5 105 21-000 
167-5— 1775 338 - 33:800 
177-5— 1875 271 27-100 
187:5— 197:5 260 26-000 
197-5— 2225 265 10-600 
222:5— 247:5 117 4680 
247-5— 297:5 142 2-840 
297-5— 347:5 105 2100 
347°5— 447:5 66 0-660 
447:5— 597:5 57 0:380 
597:5— 9975 9 0022 
997-5—1,597'5 ` 5 0-008 
1,597-5—2,597°5 2 0002 
Total 1,870 pat 

urA eNO Sh 


6.6 Graphical representation of the frequency distribution of a 
variable 

Let us consider the frequency distribution of a discrete variable 

like the one in Table 6.6. To represent such a distribution graphically, 

one may take two sectangular axes of co-ordinates—the horizontal 

for the variable, the vertical for the frequency. The different values 

of the variable are then located as points on the horizontal axis. 


Fs (1)—13 


194 FUNDAMENTALS OF STATISTIOS 


Next, at each of these points a perpendicular is drawn to represent 
the corresponding frequency. Such a diagram may be called a 
column diagram or a frequency bar diagram (Fig. 6.1). 


FREQUENCY 
eo 
o 


ROP. 2 3 4 5 6 7 8 
NUMBER OF PEAS 
Fig. 6.1 Column diagram for the frequency distribution of 
number of peas for 198 pods (Table 6.6). 

An alternative method of representing such a distribution is to use 
a frequency polygon, in which case also the values and the correspon- 
ding frequencies are plotted as points with the help of rectangular 
co-ordinates, as in a column diagram. For Table 6.6, let us first 
plot the points (1, 4), (2, 33),...... » (6, 8), (7, 1) on graph paper. 
The value preceding 1, i.e. 0, has zero frequency ; so has the value 
following 7, i.e. 8. Hence let us take two more points, (0, 0) and 
(8,0). Finally, we join the successive points by line segments to get 
a closed polygon (Fig. 6.2). 

A frequency polygon may also be used to represent the frequency 
distribution of a continuous variable, provided the classes are of 
equal width. The frequencies are now plotted against the mid-points 
of the corresponding class-intervals and joined successively by line 
segments, as in the case of a discrete variable. 

But a better method for a continuous variable is to use a histogram. 
Here on the horizontal axis we locate the class-boundaries, and over 
each class-interval we erect a rectangle whose area represents the 
corresponding class-frequency. Obviously, the height of a rectangle is 


FREQUENOY DISTRIBUTIONS 195 


FREQUENCY 


TEEN 
Ye Se 


o 1 2 3 4 5 6 7 8 
NUMBER OF PEAS 


Fig. 6.2 Frequency polygon for the frequency distribution 
of number of peas for 198 pods (Table 6.6). 


75, 


5:0 


FREQUENCY ~- DENSITY 


4 
a 


o, 4 ZZZAZZZA 
144:55 154-55 164:55 174:55 184:55 
HEIGHT (CM.) 


Fig. 6.3 Histogram representing the frequency distribution 
of height for 177 Indian adult males (Table 6.10). 


to be taken equal to the corresponding frequency-density (Fig. 6.3). 
A histogram is sometimes used for a discrete variable as well, each 
value being now regarded as the mid-point of an interval, But its 
use here is not to be recommended, because in the discrete case each 
frequency really corresponds to a single point and not to an interval. 


196 FUNDAMENTALS OF STATIS'TIOS 


However, the use of this method cannot be avoided in case each class 
includes more than one different value of the discrete variable 
(which happens when the range of the data is too wide compared to 
the total frequency). 5 


-æ - N 
o N o o 
o o o o 


CUMULATIVE FREQ. ('LESS THAN’ 
> 
o 


o 1 2 3 4 5 -6 7 8 
NUMBER ' OF PEAS 


Fig. 6:4 Cumulative frequency diagram (less-than type) for 
the data on number of peas for 198 pods (Table 6.7). 


CUMULATIVE FREQUENCY 
rs o 8 è 
o o o 


o 
ij 


144-55 154°55 164-55 17455 164-55 
HEIGHT (CM.) 

Fig. 6.5 Cumulative frequency diagrams for the juency 

distribution of height for 177 Indian adult males (rape 6.10). 


(Continuous lines for ‘greater-than‘ type, broken lines for 
‘less-than’ type.) 


—- el 


Oe a 


FREQUENOY DISTRIBUTIONS 197 


A frequency distribution may be represented graphically on the 
basis of cumulative frequencies as well.. To do this, the cumulative 
frequencies are plotted as points against the values to which they 
correspond. By joining the points, a cumulative frequency diagram 
or ogive is obtained. Special attention should be given to the 
discrete case. In Table 6.7, the cumulative frequency is 0 for 
any value less than 1, is 4 for any value greater than or equal to 
1 but less than 2, and so on. Hence the ogive will be a step 
diagram, as in Fig. 6.4. The ogives (of less-than and greater-than 
types) for the frequency distribution of Table 6.10 are shown 
in Fig. 6.5. 


6.7 Frequency curve 

When the variable under consideration is continuous, it is ins- 
tructive to visualise what form the frequency distribution would take 
if the number of observations were increased indefinitely. In any 
practical situation, one will, of course, have a finite—generally a 
rather small—number of observations and the frequency distribution 
will be formed with a small number of classes. But suppose the 
total frequency is gradually increased and that simultaneously the 
width of each class is gradually decreased so that the number of 
classes goes on increasing. The frequency distribution in such a 
case may be supposed to approach an ideal form whose nature 
may be visualised from the nature of the histogram. Suppose in 
drawing the histogram we use the relative frequencies (rather the 
relative frequency-densities) and not the frequencies (rather the 
frequency-densities) themselves. . The total area of the histogram 
will then remain the same, viz. unity. But the histogram itself 
will usually come closer and closer to a smooth curve (as in 
Fig. 6.6). This limiting form of the histogram is called a frequency 
curve, which may be viewed as representing the true or ideal 
frequency distribution of the given continuous variable, In most 
situations, one will have ultimately in view a practically infinite 
set of individuals and the given individuals will be regarded as 
constituting only a small representative part of the infinite set. 
The notion of the ideal frequency distribution will then be highly 
pertinent. 


198 FUNDAMENTALS OF STATISTIOS 


i 
! 
i 


0! 


Relative frequency density 


2 


i Variable 
i ie i (b) 
i : 
å 7" Variable 
7 Variable i 
oO @ 


Fig. 6.6 Histograms, 5 ae gradually approach 
the frequency curve, (d). 
In order to place some of the concepts and techniques in their 
proper perspective we sball, in-our discussion, often refer to this 
ideal distribution. 


6.8 Types of distribution 
Frequency distributions may be classified into certain broad types. 
We shall distinguish between the different types by referring to the 
frequency curve of a continuous variable, but a good idea about the 
form of the distribution may be had from the nature of the histo- 
gram, provided the total frequency and the number of classes are at 
least moderately large. Moreover, the distinctions will remain valid 
even when we come to frequency distributions of discrete variables, 
though the notion of frequency curve is no longer appropriate. 
The broad types of distribution are as follows : 
(a) Bell-shaped (i.e. unimodal) frequency distribution 
The frequency curve of a distribution of this type will have a 
single maximum near about the middle of the range and the fre- 
quency-density will decrease, either at the same rate or at different 
rates, towards the two ends of the range. Bell-shaped distributions 
again fall under two sub-categories : 
(a.l) Bell-shaped symmetrical distribution, whose curve is perfectly 
symmetrical having a maximum right at the middle of the range. 


FREQUENCY DISTRIBUTIONS 199 


Most distributions arising from measurements in the fields of 
biology, psychology and manufacturing industry are found to be 
very nearly of this type. 


(a.2) Bell-shaped moderately skew distribution, for which the fre- 
quency-density decreases at different rates on the two sides of the 
maximum. In case the rate of decrease is faster to the left than to 
the right, so that the right-hand tail of the distribution is longer 
than the left-hand one, the distribution is called positively skew. On 
the other hand, in case the rate of decrease is faster to the right 
of the maximum than to the left, so that the left-hand tail is longer, 
the distribution is called negatively skew. 

Most distributions to be encountered in practice will belong to 
this type. Some common examples are the distribution of equal- 
sized plots according to yield of a crop, age distribution of bride- 
grooms for marriages contracted in a year, distribution of age at 
death for people dying of a disease, say pneumonia, etc. 


g ; i 
H i 
VARIABLE ——> VARIABLE ——> 
g 
i| 
5a 
s o] 


VARIABLE —> 
Fig. 6.7 Bell-shaped frequency : (fi top lefi 
positively skew, pions skew ons pakaa 4 
(b) Multimodal distribution 
A distribution of this type will have more than one local maxi- 
mum. This type is rarely encountered and is generally taken as 
indicative of the non-homogencity of the data. Consider, for 


-200 FUNDAMENTALS OF STATISTICS 


example, height data relating to adult males of three distinct ethnic 
groups, like Bengalees, Punjabis and Marathis, that are mingled 
together and suppose a frequency distribution is formed with the 
combined data, This frequency distribution will assume a multi- 
modal (rather, a trimodal) form. 


DENSITY ———> 


RELATIVE FREQUENCY 


[e] 


VARIABLE —— > 
Fig. 6.8 A multimodal frequency curve, 


(c) J-shaped distribution 

This is the extreme type of an asymmetric frequency distribution, 
A distribution of this kind has its highest frequency or frequency- 
density at one end of the range, and the frequency or frequency- 
density gradually decreases as one moves to the right or to the 
left (as the case may be) of the point. 

Income distribution of earners, age distribution of the population 
of a country, etc., are found to closely resemble this form. 


l l 
3 

Žž, z 
2 33 
33 z 
č = 


o 
o 


VARIABLE —> VAPIABLE ——> 


Fig. 6.9 J-shaped frequency curves. 


FREQUENOY DISTRIBUTIONS 201 


(d) U-shaped distribution 

For this type of frequency distribution, the frequency or frequency- 
«density is lowest for some value at the middle of the range and 
increases, either at the same rate or at different rates, as the value 
of the variable shifts to the left or to the right of this point. 


RELATIVE FREQUENCY 
DENSITY ———> 


o 


VARIABLE —> 
Fig. 6.10 A U-shaped frequency curve. 

This is a rare form of distribution. If one classifies the deaths 
‘occurring in a large population according to the age of the decea- 
sed, one will end up with a frequency distribution that resembles 
this form, Yule and Kendall [4] present meteorological data to 
show that the frequency distribution of days in a month according 
to the degree of cloudiness at a place (Greenwich in their example) 
is also expected to be of this type. 


Questions and exercises 


6.1 Explain, with suitable examples, the distinction (a) between 
an attribute and a variable and (b) between a discrete variable and 
a continuous variable. : 

6.2 Discuss the different considerations to be kept in view in 
drawing up a frequency distribution for data on a continuous 
variable. 


202 FUNDAMENTALS OF STATISTICS 


6.3 Explain, with suitable examples, the following terms : 
Frequency, relative frequency, cumulative frequency, frequency- 
density, class mark, class limits and class boundaries. 


64 Discuss the different modes of graphical representation of 
frequency distributions of different types. 

6.5 Describe how you can represent a frequency distribution 
diagrammatically in terms of the cumulative frequency, covering 
both the discrete variable and continuous variable cases. 

6.6 What is a frequency curve ? Give the broad categories under 
which frequency distributions may be put, indicating in each case 
-the nature of the frequency curve. 

6.7 The word-length for each of 90 words in a poem by Tagore 
is shown below : 


5 4 3 5 8 6 6 3 4 
3 4 4 5 8 2 6 7 6 
4 5 6 4 9 6 4 2 2 
2 9 2 3 3 3 2 4 7 
7 2 4 4 4 3 4 4 2 
4 4 9 3 7 4 5 12 6 
3 5 2 5 10 3 5 7 3 
3 3 6 2 5 3 3 3 2 
4 5 8 5 3 4 4 6 7 
2 8 5 5 5 3 2 4 5 


Construct a frequency table and also obtain the relative 
frequencies and the cumulative frequencies (of the ‘less-than’ type). 
Represent the data in a column diagram, a frequency polygon 
and a cumulative frequency diagram. 
6.8 On the basis of the table constructed in Exercise 6.7, answer 
the following questions : 
(a) What is the proporti~n of words with 9 letters? 
(b) What is the number of words with 3 letters or less, and 
what is the number of words with 5 letters or more ? 


(c) What is the number of words with not less than 4 and not 


more than 6 letters ? Ans. (a) 0:0333 (b) 32, 38; (c) 44. 


6.9 With the data shown below, form a frequency distribution 
with six classes. Show the frequencies, the relative frequencies and 


wim 


FREQUENOY DISTRIBUTIONS 203 


the cumulative frequencies (of both the less-than and the greater- 

than type). Finally, represent the distribution by means of suitable 

diagrams. a 
Life (in hours) of 100 electric bulbs 


511 991 1,177 1,016 600 777 895 749 1,067 980 
923 1,314 1,108 1,137 906 1,230 1,099 1,242 803 1,131 
918 1,240 1,057 980 992 763 759° 1,394 1,111 1,117 
1,143 808 948 857 962 922 817 1,057 665 1,171 
936 1,068 750 873 1,139 1,127 1,163 934 515 907 
1,061 1,198 1,027 1,081 991 1,155 1,199 806 950 1,262 
848 1,293 956 1,140 885 1,330 1,166 1,333 1,146 933 
820 880 982 912 1,100 1,293 1,192 1,371 1,023 1,298 
1,059 1,092. 1,091 1,182 699 803 1,069 922 1,245 706 


1,053 1,001 939 1,248 850 985 1,219 945 1,012 846 
A a Se A lS a o 


SUGGESTED READING 


[1] Mills, F. C. Statistical Methods (Ch. 3). H. Holt, 1955. 

[2] Simpson G. and Kafka, F. Basic Statistics (Chs. 8, 9). W. W. 
Norton, 1957, and Oxford & IBH, 1965. 

[3] Wallis, A. W. and Roberts, H. V. Statistics : a New Approach 
(Ch. 6). Methuen, 1957. 

[4] Yule, G. U. and Kendall, M. G. An Introduction to the Theory of 
Statistics (Ch. 4). Charles Griffin, 1953. 


“oo - MEASURES OF 
CENTRAL TENDENCY 


7.1 Descriptive measures of statistics 

It was noted in the previous chapter that the primary purpose of 
Statistical methods is to summarise the information contained in any 
set of data. The purpose is served to some extent by classifying the 
data in the form of a frequency distribution and using various graphs, 
When the data relate to a variable, the process of summarisation can 
be taken a long step further by using certain descriptive measures. 
The aim is to focus on certain features of the data which will 
describe their nature in a general way. The two most important 
features are central tendency and dispersion. Two other features, which 
are also of some importance, are skewness and kurtosis. 


7.2 Central tendency 

Quite often there will be found in the data a tendency, - not- 
withstanding their variability, to cluster around a central value. 
This will be apparent from Table 7.1, where the figures seem to 
cluster around some point between 1,200 gm. and 1,300 gm. 


. TABLE 7.1 


YIRLD PER PLANT ror 12 Tomaro PLANTS OF A 
PARTICULAR VARIETY 


Plant No. | Yield (gm.) Plant No. | Yield (gm.) 
1 1,216 | 7 1,202 
2 1,374 8 1,372 
3 1,167 | 9 1,278 
4 1,232 | 10 1,141 
5 1,407 11 1,221 
6 1,453 12 | 1,329 


oe eee 
204 


MEASURES OF OENTRAL TENDENCY 205 


In such a case, it would be legitimate to use a single value, the 
central value, to represent the whole set of figures. Such a represen- 
tative or typical value of a variable is called a measure of central 
tendency or an average. 

The idea of average is a familiar one. One has this idea in mind, 
maybe rather vaguely, when one says that Germans live longer than 
Indians. By this one does not mean that the longevity of every 
German is higher than that of every Indian. All that is meant is 
that the longevity of a typical German is higher than the longevity 
of a typical Indian—in other words, the average longevity of 
Germans is higher than the average longevity of Indians. In connec- 
tion with a frequency distribution, an average is also referred to as a 
measure of ‘location, because it determines, as it were, the position 
of the distribution on the axis of the variable. 

Three types of average are in general use, viz. arithmetic mean, 
median and mode. 


7.3 Arithmetic mean 

The arithmetic mean (or, simply, mean) of a variable is obtained 
by dividing the sum of its given values by their number, If the 
variable is denoted by x and if n values of x are given, viz. 


Kay Xps comers ,%,, then the arithmetic mean of x is 
z= Dail. oon (7.1) 
Example 7.1 For the data of Table 7.1, the arithmetic mean is 
(1,216 41,374+..+++ +1,329) /12=15,392/12 =1,282-67 gm. 


Example 7.2 Now consider Table 6.4. By using the above 
formula, the arithmetic mean of the number of peas per pod is found 


b 
eis (4434+54 -e +2+2)/198=683/198=3:45. 


But the computation could be simplified if we took the help ot 
Table 6.6. This table-shows that in the sum the variate value 1 
occurs 4 times, the value 2 occurs 33 times, and so on. Hence the 
sum of the given values would be 

1x44+2x33+..... +7 x 1=683, 


as before. 
This shows- that if the values of a discrete variable are arranged 


206 FUNDAMENTALS OF STATISTIOS 


in a frequency table, each class being defined by a single distinct 
number, then formula (7.1) may be expressed in the alternative 
form : f 

z= Èx Siln, eea 
where x; is the value of x in the ith class, f; is the corresponding 
frequency and r= 3, fi 

When we have a frequency table with more than one variate 
value in a class, formula (7.2) may still be used, x; now denoting the 
mid-value of the ith class-interval. But in this case (7,2) will give 
only an approximate value of the mean. ‘The error of approximation 
will, however, be negligible provided the range of x is very large 
compared to the width of the class-intervals. 

It should be noted, however, that for a continuous variable 
formula (7.1) also will give only an approximate result because of the 
inevitable errors of observation contained in the data. 

Formule (7.1) and (7.2), with some simplifying modifications, 
will be used for computing the mean and some other measures for a 
continuous variable in Examples 9.1 and 9.2. 

Some important properties of the arithmetic mean may be 
mentioned : 

(a) By definition, 


Bans, or Zimna, 
Subtracting ¥ from each term on the left-hand side and modify- 
ing the right-hand side accordingly, we have 
3 (x;—#) =n — nk =0. oo ma td] 
This shows that the sum of the deviations of the given values of a 


variable from its mean is necessarily zero. 
(b) Suppose the given values of x are all equal to a constant a. 


Then Sx =na, and hence 
i51 
#=a. 
Thus the mean of a variable whose given values are all equal must 
also be the same as their common value. 


MEASURES OF CENTRAL TENDENOY 207 


(c) Let y=a+bx. Corresponding to the value x; of x, there 

will be the value y,=a+bx; of y. Hence 

n n 
n L(a+bx;) natb Yai 
Sek eae e eee na, (74) 
Thus, if y=a-+bx, a linear function of x, then the arithmetic means 
of y and x are related in the same way as y and x themselves are. 

(d) Let there be two sets of values of x, the number of values 
in the two sets being ny and ng and the means being x, and 7p, 
respectively. 

Let xj (J=1, 2, ee +) 
denote the values in the first set and 


ag (j=l Drouin) 
those in the second set. ‘The sum of x values in the two sets taken 
together will then be equal to the sum of values in the first set plus 
the sum of values in the second set, i.e. 

s] Lu 

2 j +2 of =F + Moke. 
But this sum must be equal to (n;+n,) times the grand mean of x. 
Hence the grang mean is 


gaii Hna, 
nit ng 
Generally, if there bz t sets of values of x, containing ny, ngs «s... s 
values and having means Fy, Xg; =e , % then the grand mean of x is 
‘ t 
k= dna E ny. ve (7.9) 
=1 = 1 


(e) Suppose the values of two variables, x and y, are given for 
each of n individuals. Ifa new variable is formed, viz. 
z=ax+by, 
then for the ith individual the value of the new variable is 
zim axt bw 
Sum ming over all individuals, we have 


n n n 
= i b j3 
Lti atit 2y 


i=t 


so that ; z=ak-+ bj. we (7.6) 


208 FUNDAMENTALS OF STATISTIOS 


7.4 Median 

If the given values of x are arranged in an increasing or decreasing 
order of magnitude, then the middle-most value in this arrangement 
is called the median of x. (The median may alternatively be defined 
as a value of x such that half of the given values of x are smaller 
than or equal to it and half are greater than or equal to it,) 

When the number of values, n, is odd, the middle-most value— 


i.e. the ot Deh value—in the arrangement will be the unique 


median of x. 
On the other hand, when n is even, there will be no unique 


median. For any number between the 5th and (5+ 1) st values of x 


in the arrangement, being regarded as middle-most, is now to be 
taken as a median, according to the above definition. However, for 


the sake of definiteness, the arithmetic mean of the 5th and (5+ 1) st 


values is here accepted as the median of x, by convention. 

Example 7.3 The yields (in gm.) of barley from 7 plots, of size 

one sq. yd. each, were found to be 
180, 191, 175, 111, 154, 141 and 176. 

To determine the median yield, these are firsf arranged in an 

increasing order of magnitude, i.e. as 
111, 141, 154, 175, 176, 180, 191. 

The median is the 4th value in this arrangement, i.e. 175 gm. 

Example 7.4 The number of letters (word-length) in each of six 
words taken from a dictionary is shown below : 

7, 9, 5, 10, 4 and 8. 

Arranged in increasing order, the values will be 

4, 5, 7, 8, 9, 10. 

The number of values being six, any number between the 3rd and 
the 4th (i.e. between 7 and 8) may be considered a median of the 
variable. For the sake of definiteness, one may here take their 
arithmetic mean, 7.5, as the median. 

In the above arrangement, if the 3rd and 4th values were both 7, 
then no difficulty would arise, because 7, being the middle-most 


value, would then be the only median. F aa Í 


"© 


MEASURES OF OENTRAL TENDENCY 209 


When the observations are grouped into a frequency distribution, 
the median can be obtained on the basis of the cumulative frequencies. 
For the cumulative frequency table itself provides an arrangement of 
the observations in an increasing or decreasing order of magnitude, 
according as the cumulative frequencies are of the ‘less-than’ or of 
the ‘greater-than’ type. 

Example 7.5 Take the frequency distribution of number of peas 
per pod given in Table 6.6. The cumulative frequencies in Table 6.7 
indicate that if the given values of the variable are arranged in an 
increasing order of magnitude, then the first four values will be all 
equal to 1, the 5th to the 37th will be equal to 2, the 38th to the 
113th will be equal to 3, and so on. The total number of observa- 
tions being 198, the median s any number between the 99th and 
100th observations, which are both equal to 3. Hence the median 
for this distribution is 3. 

The frequency distribution of a continuous variable needs special 
attention. The median here may be supposed to be the value for 
which the cumulative frequency is n/2, the reason for which will be 
apparent from the Second form of the definition of median. By going 
through the cumulative frequency table, we can then determine in 
which interval the median lies. Suppose the cumulative frequencies 
are of the ‘less-than’ type. The following formula will then give an 
approximate value of the median. 

Let us denote the lower and upper class-boundaries of the class 
containing the'median by x; and x, and the corresponding cumulative 
frequencies by n and nu, respectively. If we assume that cumulative 
frequency changes from n; to ny between x, and x, at a constant 
rate, i.e. if we assume that cumulative frequency is a linear function 

„of x between x, and x,, then the median, which is the value with 
cumulative frequency n/2, will satisfy the relation 

Mi- %_n[2—n 

Xy— Xp Ny — n 


This gives 


Mima Xe, S OA 
. Jo 
where c and f, are the width and the frequency of the class-interval 


containing the median, Mi. 


¥a(t)—14 


210 FUNDAMENTALS OF STATISTIOS_ 


pe The same value may also be obtained geometrically, from the 
ogive of the frequency. distribution. The median will be given by 
the abscissa of the point on the ogive for which the ordinate is n/2. 


Example 7.6 Let us consider the frequency distribution of Table 
6.10. Here the total frequency is n=177, so that n/2=88°5. 

Hence, on going through the cumulative frequencies of the ‘less- 
than’ type, it is found that the median lies between #;= 16455 cm. 
and x, =169-55 cm., for which the cumulative frequencies are n,=86 
and n,=146. Here c=5 and f,=60. Therefore, formula (7.7) 
gives 

Mi= 164554 8 9— 86. 5 


= 164-55 40-208 = 164-758 cm. 

Alternative method: The median for the. same distribution is 
obtained graphically in Fig. 6.1. The ogive of the distribution is 
first drawn, Then through the point n/2=88-5 on the vertical axis 
a line parallel to the x-axis is taken, which intersects the ogive at P, 
From P a perpendicular is let fall on the x-axis. The point at which 
it meets the x-axis is the median, Mi. 

From Fig. 7.1 Mi is thus found to be about 164:76 cm., as before. 


200 
> 
u 
z 
wW 
4 150 
o 
w 
x 
u 
100 
Sie alee a Sa 
S 
b 
< 
5 $0 
2 
3 
> 
Vv 
o 
144.55 154.55 164.55 174.55 104.55 
M; 


HEIGHT (cm) 


Fig. 7.1. Ogive showing the location of the median for the 
height distribution of Indian adult males (Table 6.10). 


MEASURES OF OENTRAL TENDENOY 211 


7.5 Mode 

The mode of a variable is the value of the variable having the 
highest frequency*. This definition, properly speaking, applies to a 
discrete variable only. 

Example 7.7 Consider the data of Table 6-4 again. Itis found . 
from Table 6.6 that the value 3 is the one with the highest frequency. 
Hence the mode of the number of peas per pod is 3. 

For a continuotis variable, the above definition needs to be 
modified. The mode here is the value of the variable with the’ 
highest frequency-density* corresponding to the ideal distribution 
which would be obtained if the total frequency were increased 
indefinitely and if, at the same time, the width of the class-intervals 
were decreased indefinitely. Graphically, it may be looked upon as 
the abscissa corresponding to the highest ordinate in the frequency 
curve (which is the limiting form of a histogram or a frequency 
polygon) of the ideal distribution. In Fig. 7.2 we have a frequency 
curve, where the mode is denoted by Mo. , 


FREQUENCY ~ DENSITY 


M, 
VARIABLE 
Fig. 7.2 A frequency curve. 

For a frequency distribution obtained from a finite number of 
observations, like that in Table 6.10, the mid-value of the class- 
interval having the highest frequency may .be approximately taken 
to be the mode. If this class-interval extends from x; to x,, then the 


* If there are more than one value ezch with the highest frequency or frequency- 
density (as the case may be), then Ihe mode is not defined. 


212 FUNDAMENTALS OF STATISTIOS 


mode is approximately given by 
Mo="¥*e =x,+¢/2, 


c being the width of the interval. 


FREQUENCY ~ DENSITY 


VARIABLE 


Fig. 7.3 Histogram of an observed distribution. The 
hatched portion shows the frequencies in the modal class 
and the two adjacent classes. 


This formula may generally be improved upon by considering, 
together with the class having the highest frequency, the class imme- 
diately preceding and the class immediately following it. Supposing 
these classes are of equal width c and have frequencies Jo f-1 and fy 


(vide Fig. 7.3), one would take žen 46/2 as the mode, if f,—f_, 


and f,—f, were equal in magnitude. If, on the other hand, So—S-1 


be smaller (greater) than f,—f,, one would suppose that the mode 
is nearer (farther from) x, than x,. 


Mathematically, this amounts to supposing that 


Mo—x, a al 
%u—Mo h-fh 
This leads to 
Mo=ntz hi Xe. «s (7.8) 


Another method of determining the mode is to make use of the 
empirical relation 


#—Mo=3(ł— Mi), ate Fo) 


MEASURES OF ORNTRAL TENDENOY 213 


which is found to be approximately valid for moderately skew distri- 
butions (vide Chapter 9). Relation (7.9) states that the amount by 
which the mean exceeds (is smallar than) the mode is three times the 
amount by which the mean exceeds (is smaller than) the median. 
Given the mean and the median, an approximate value of the mode 
may thus be obtained, viz. the value 

3Mi—2z. 

Example 7.8 Consider the height distribution of Indian adult 
males (Table 6,10). The class-interval with the highest frequency 
has boundaries x,=164:55 cm. and x,=!69°55 cm. The frequencies 
in the two adjoining classes, which are also of the same width c=5 
cm., are f_;=58 and f,=27. As to the modal class, it has frequency 
fo=60. According to formula (6.8), then, the mode is approximately 

60—58 
= 164-55 +-0:286 = 164836 cm. 

Alternative method: For this distribution the mean is found to be 
= 164-734 cm. (vide Example 9 2) and the median is found to be Mi= 
164-758 cm. (vide Example 7.6). Relation (6.9), therefore, gives 

Mo =3X 164:758—2 x 164-734=161-806 cm. 
approximately. 
7.6 Comparison of mean, median and mode 

The mean is rigidly defined. So is the median except when 
there are an cven number of observations, and so is the mode except 
when there are more than one value with the highest frequency or 
frequency-density. 

The actual computation of all three measures involves almost 
the same amount of labour. But the determination of mode in the 
continuous case is impossible if only a few values of the variable are 
given. Even when a large number of observations grouped into a fre- 
quency distribution are available, the mode is difficult to determine : 
the method we have outlined above is not wholly satisfactory. 

The general nature of the mean, like that of the median or mode, 
is easily comprehensible, the mean being the value that. would be 
possessed by each of the given individuals if the total value (Sx) 


were distributed equally among them. 


214 FUNDAMENTALS OF STATISTICS 


Although in determining each of the three measures all- the 
observations have to be taken into account, in the actual computa- 
tion only the mean directly uses all the observations—so much so that 
its value changes when any one of the observations is changed. 
But this is not the case with the median or the mode: some 
observations may be altered and yet the median or the mode may 
remain the same. 

Moreover, the mean, as we have scen, has certain Properties 
which enable it to be used readily in theoretical work. But the 
median and mode do not Possess any such desirable property. 

“Besides, of the three measures, the mean is generally the one 
that is the least affected by sampling fluctuations, although in some 
particular situations the median or the mode may be superior in 
this respect. (The term ‘sampling fluctuations’ will be explained in 
Section 15.2.) 

It is, therefore, obvious that, by and large, the mean may be 
regarded as the best measure of central tendency. 

In some special cases, however, the use of the mean is not to be 
recommended. Two such cases are considered below. 

Suppose the values of the variable are given in the form ofa 
frequency table of which one or both of the terminal classes are open 
(the table given in Exercise 7.20 Presents a case in point). Here the 
computation of the mean is impossible, because the class-marks of 
those terminal classes are indeterminate, But this will generally be 
no bar to the computation of the median or mode, 

Again, let the weights of 8 persons be 138, 143, 141, 139, 152, 148, 
160 and 267 Ib. Here the mean is 161 Ib., but this cannot be said to 
be a represent itive value, because seven out of the cight given values 
are smaller than 161. In cases of this sort, where the data contain 
a few extreme values widely different from the majority of the values, 
the mean should not be used, In the present example, the median 
would be the appropriate avera 3 

In such a case the mean will also be subject to higher sampling 
fluctuations than the median. The following simple, though rather 

artificial, example illustrates this point. Let us consider the drawing 
of samples of size 3 from a set of 4 values : 
15, 30, 33, 36. 


MEASURES OF OENTRAL TENDENOY 215 


The different possible samples and the corresponding means and 
medians are shown below: 


Serial No. Sample values © Mean Median 


l 15, 30, 33 26 30 
2 15, 30, 36 27 30 
3 15, 33, 36 28 33 
4 30, 33, 36 33 33 


Clearly, the column of sample means shows greater variability than _ 
the column of medians. x 

The median has the additional property that the individual having 
the median value remains the same under any transformation that 
leaves the order of the values unchanged (vide Exercise 7.9). Hence 
whenever the order of the values is considered to be of importance, 
the median will be preferred to both the mean and the mode. 


7.7 Other measures of central tendency 

Besides the arithmetic mean, median and mode, there are two 
other averages which are relatively unimportant but may be appro- 
priate to particular situations, These are the geometric mean and the 
harmonic mean. 


If a variable x has n given values, x1, Xp, +--+ > Xa; then its 
geometric mean x, is defined by 
x= (e). vee (7,10) 
i 


It is immediately seen that 
log x, = Zoe ma (EU 


Thus the logarithm of the geometric mean of a variable is the 
arithmetic mean of its logarithm. 

Consider two variables, x and y, whose values are given for each 
of n individuals. Let x; and y; (i=1, 2, ...... ,n) be the values of x 
and y for the ith individual. Then 


Teil" =O" TL" 5 


that is, the geometric mean of the ratio of x and y is the ratio of 
their geometric means. Owing to this property of the geometric 


216 FUNDAMENTALS OF STATISTICS 


mean, it is sometimes preferred for averaging ratios of two variables, 
for instance, in constructing price index numbers (vide relevant 
chapter in Vol. 2). Some consider the geometric mean to be the 
natural form of average for averaging price-relatives since it gives 
equal emphasis to the same ratio of increase and of decrease in price. 
Suppose the price-relative for one commodity is 5 (that is,sits price 
has increased 5 times) and for another commodity it is 4 (that is, its 
price now is $th of the original price). If equal importance is to 
be given to these two ratios, then their average should be 1. This 

` criterion is satisfied by the geometric mean only, 


Example 7.9 The ratios of the Prices in 1964 to those in 1952 for 
four commodities are 0:92, 1-25, 1-75 and 0-85. To get the average 
Price-ratio using the geometric mean, we see that 

log x, = (log 0-92-+log 1:25 + log 1-75 +log0-85)/+ 
= (1:96379+.0-0969] +0°243044 1-92942)/4 
=0-05829—log 1:1436. 
Thus 
*,=1-+144, 

The geometric mean also comes in if one wants to determine the 
value of a variable at the mid-point of a time interval when the 
variable changes over time exponentially, Thus, if the values at two 
points of time, say 0 and 1, be a and ar', then its value at the middle 
of the interval, i.e. at t/2, is 

ar! Vax axl, 
which is the 
the interval, 

Formula (7.10) shows that the 
single value of the variable happens to be zero, 


geometric mean for the values at the terminal points of 


is of too abstract a f 
is not of common use in statistical work. 
The harmonic mean (x4) of a variable *, with given values x; 
(i=1, 2, ......, n), is defined by 


za=nfg l e (7.12) l 


i*i 


MEASURES OF CENTRAL TENDENOY 217 


j ees (et | 
or Pre st. see > (7.138) 
The second formula shows that the reciprocal of the harmonic mean 
of a variable is the arithmetic mean of its reciprocal. 

Sometimes the variable may be in the form ‘x per unit p’, e.g. 
miles per hour, rupees per maund, lb, per cubic foot, etc. Jn such 
cases, the harmonic mean would be the proper average if equal 
units of x were considered, while the arithmetic mean would be 
appropriate if equal units of y were considered. This may be 
illustrated with the following example : 

Suppose a train moves n equal distances, each of s miles, say, 
with speeds vy; Ups «+++ s Un miles per hour, The average speed is, of 
course, the total distance covered divided by the total time taken. 
Thus the average speed is 


ns n 


the harmonic mean of the given speeds. 

If, however, the train moves for n equal time intervals, each of 
length ¢ hours, say, with the above speeds, then the average spced 
will be 


vlt gt 4-000 +0,t_ ptt vheleee Fos. 
nt 


the arithmetic mean of the givensspeeds. 

Example 7.10 Suppose milk is sold at the rates of 1-80, 2-00, 2:20 
and 2-50 rupees per litre in four different months. Assuming that 
equal amounts of money are spent on milk by a family in the four 
months, the average price in rupees per litre will be the harmonic 
mean of the given figures, viz 

n= r r rn. 
rot 2-007 220230 

In some cases, instead of simple means—simple arithmetic, 
geometric or harmonic means—weighied means are used. The values 
of the variable may not have the same importance. Hence one 
considers, along with the values x,, xy ...... , Xas a set of weights w,, 


218 FUNDAMENTALS OF STATISTIOS 


ORO. » Wn, Where w; indicates the importance of the value x, in the 
given context. The weighted arithmetic mean is then 
x 
yay we (7.14) 
Zwi 


Obviously, formula (7.2) also gives in a way a mean ofthis type. 
It is the weighted arithmetic mean of the class-mark of x with 
weights equal to the corresponding class-frequencies, Another 
frequently used weighted arithmetic mean is the cost of living index 
in its usual from (vide the chapter on Index Numbers in Volume 2). 

In the same way, the weighted geometric mean of x for the above 
sets of values and weights will be 


w,\1 i 
(m P i) lwi ase (7-15) 
and the weighted harmonic mean will be 


Z wi s S LE YS 


i X 


Questions and exercises 


7.1 What do you mean by the central tendency of a frequency 
distribution ? Describe the common measures of central tendency, 
7.2 What are the desiderata of a good measure of central 


tendency ? Compare the mean, the median and the mode in the 
light of these desiderata. 


7.2 How, in your opinion, should an average change when all 
values of the variable are increased or decreased 
(1) by the same amount ? 
(2) in the same proportion ? 
Judge in this light the different averages considered in this 
chapter. 


7.4 State and prove the important Properties of the arithmetic 
mean. 


MEASURES OF CENTRAL TENDENOY 219 


7.5 Give some examples where the geometric mean or the 
harmonic mean would be the appropriate type of average. 


7.6 Show that the median of a variable’is the abscissa of the 
point of intersection of its two ogives (of the ‘less-than’ and ‘greater- 
than’ types). 

7.7(a) There are two sets of values of x. The first set with 
n values has median M, and the second with n, values has median 
M,. Show that the median of all n,+-n, values taken together must 
lie between M, and M,. 

(b) Show that if, and z, are the means of the two sets, then 
the mean 7 of the combined set also lies between x, and ,. 


7.8 Let x be a variable assuming the values 1, 2, ...... „k and 
lét F’ =n, E agt , F’, be the correspo ding cumulative frequencies 
of the ‘greater-than’ type. Show that 

1 Š F 
I= E 
nizi ‘ 


7.9(a) Suppose x is a variable (discrete or continuous) with | 
median Mi. If y=g(x) be a monotonically increasing or decreasing 
function of x, show that the median of y is g( Mi). : 

(b) Canasimilar statement be made with regard to the mean 
or the mode ? 


7.10 If y=a-+-bx and Mo is the mode of x, then show that the 
mode of y must be a+6Mo. 


7.11 Suppose the average of a number of temperature readings 
is to be determined. Show that it does not matter whether the 
readings are in the centigrade or in the Fahrenheit scale if one uses 
the arithmetic mean, but that it does matter if ong uses the geometric 
or the harmonic mean. 


7.12 Let x be a variable assuming positive values only. Show 
that (a) the arithmetic mean of the reciprocal of x cannot be smaller 
than the reciproca’ of its arithmetic mean, and that (b) the arithmetic s 
mean of the square-root of x cannot be greater than the square-root of 
its arithmetic mean. 


220 FUNDAMENTALS OF STATISTIOS 


7.13 The mean weight per student in a group of 6 students is 
119 lb. The individual weights of 5 of them are 115 lb., 109 1b., 
129 lb., 117 Ib. and 114 1b. What is the weight of the other student 
of the group ? Ans. 130 Ib. 


7.14 A factory has 5 sections, employing 105, 184, 130, 93 and 
124 workers. The mean earnings in a certain week per worker 
are Rs. 63:84, 65-12, 65:27, 68:19 and 64-22 for the 5 sections. 
Determine the mean earnings per worker for the whole factory. 

Ans. Rs. 65-21. 


7.15 The mean monthly income of a gentleman is Rs. 1219 
and his mean monthly expenditure comes out to be Rs. 1193. 
What are his mean monthly savings ? Ans. Rs. 26. 


7.16 The following data show the length of ear-head (in cm.) for 
24 cars of a variety of wheat. Compute the mean and the median. 


11-5 88 10-1 
8-2 9-3 10-0 
9:7 10-1 103 

10-3 11:3 98 

10:7 98 9-3 
8-6 10-4 9-8 

11-3 84 9-0 

10-7 96 11-2 


Ans. 9:9cm.; 9.9 cm. 


7.17 For a certain frequency table with total frequency 150, the 
mean was found to be Rs. 7647. But while copying out the table, 
a typist left out two of the class frequencies, say J* and f**, so that 
the table is given to you in the following form : 


Weekly wages in Rs. 


peel 65 | 70 | 75 | 80 | 85 | 90 | 95 | Total 
E a sol Wi cath e's : 
aH 

Frequency 5 | 48 | f* 30 | ses | 8 | 6 150 


Determine f* and f**, Ans. f*=4l, f**k =12, 


MEASURES OF OENJRAL TENDENCY 221 


7.18 The numbers of telephone calls received at an exchange 
per interval for 245 successive one-minute intervals are shown in 
the following frequency distribution : 


Number of calls | Frequency 


14 
21 
25 
43 


NU MURONK oo 
ou 
= 


Total | 245 


Evaluate the mean, median and mode. Ans. 3°76; 4; 4. 


7.19 Compute the mean, median and mode for the following 
frequency distribution : 
Frequency distribution of 1.Q, for 309 six-year-old children 


TQ. | Frequency 
160—169 2 
150—159 3 
140—149 7 
130—139 19 
120—129 37 
110—119 79 
100—109 69 
90— 99 65 
80— 89 17 
70— 79 5 
60— 69 3 
50— 59 2 
40— 49 1 

oe ne 
Total 309 


Ans. 108-48 ; 108-41 ; 111-42, according to formula (7.8), 
and 108-28, according to formula (7.9). 


222 FUNDAMENTALS OF STATISTIOS 


7.20 Determine the median and mode for the following distribu- 
tion of monthly income for 580 middle-class people : 


Monthly income (Rs.) | Frequency 


—300 i 53 


Total 580 


Ans. Rs. 410-77; Rs. 419-01. 
7.21 The age-distribution of 4,488 Bengali males is given below : 


Age last birthday | Frequency 

0 156 

1 121 5 
2 111 
3 106 
4 103 
5— 9 472 
10—14 434 
15—19 407 
20—24 383 
25—29 357 
30—34 335 
35—39 306 
40—49 522 
50—59 370 
60—69 213 
70—79 80 
80—89 il 
90—99 1 


Total 


Compute the mean age of Bengali males by means of formula (7.5). 
Ans. 27:40 years, 


MEASURES OF CENTRAL TENDENOY 223 
SUGGESTED READING 


[1] Mills, F,G. Statistical Methods (Ch. 4). H. Holt, 1955. 

[2] Simpson, G. and Kafka, F. Basic Statistics (Chs. 10—12). 
W. W. Norton, 1957, and Oxford & IBH, 1965. 

[3] Wallis, W. A. and Roberts, H. V. Statistics: a New Approach 
(Ch. 7). Methuen, 1957. 

[4] Yule, G. U. and Kendall, M. G. Introduction to the Theory of 
Statistics (Ch. 5).. Charles Griffin, 1953. 


8 MEASURES OF 
DISPERSION 


8.1 Meaning of dispersion 

The average of a variable gives a general idea as to the whole 
set of its values. It is clear, however, that for a variable to be 
really variable, its given values will not be all equal to the average. 
In some cases they may lie very near the average, while in others 
they may be widely scattered about it. 

An example will make the point clearer. Suppose two stud=nts, 
A and B, ina college received in eight monthly examinations the 
following marks in a particular subject : 


Marks obtained Marks obtained 
-by A by B 
63 61 
47 54 
56 56 
44 57 
66 60 
65 59 
80 55 
43° 62 


If the arithmetic mean is taken to be the proper average to be 
used in this case, we find that the average score of each student was 
the same, viz. 58. Yet the overall nature of the scores was not at all 
the same for the two students. Thus A received as high a score as 
80 and as low a score as 43. Oa the other hand, B’s score remained 
near about 58 throughout, In short, B gave a more consistent 
performance than A. 

Thus, in order to give a proper idea about the overall nature of 
the given values of a variable, it is necessary, besides mentioning the 
average value, to state how scattered the given values are about the 
average. Mainly three different measures are used to determine this 
feature of a variable, which is called its scatter or dispersion. (It may 
be said that while the central tendency of a variable is the tendency 

224 


MEASURES OF DISPERSION 225 


of its values to be similar, its dispersion represents the tendency of the 
values to be different.) These measures are (a) the range, (b) the mean 
deviation and (c) the standard deviation. 
8.2 Range 

The simplest measure of the dispersion of a variable is its range, 
which is defined as the difference between its highest and lowest 
given values. In the above example, the range of marks obtained is 
80—43=37 for A and 62—54=8 for B. 

8.3 Mean deviation 

If A be the chosen average value of the variable x, then x;—A is 
the deviation of the ith given value of x from the average. Clearly, 
the higher the deviations 

*1—A, x2—A, e s ža —A 

in magnitude, the higher is the dispersion of x. One may, therefore, 
consider some way of combining the deviations to get a measure of 
dispersion It is readily seen that the simple arithmetic mean of the 


deviations, viz. 136-4), cannot serve this purpose. For the sum of 
7 


the deviations—and proportionately the arithmetic mean—may be 
quite small even when the individual deviations are large, positive and 
negative deviations almost cancelling each other. In fact, if A be the 
arithmetic mean of x, this sum vanishes, whatever the deviations are 
individually, This difficulty may be overcome by considering, instead 
of the deviations themselves, their absolute values, in which case 
their magnitude aloné (and not their signs) will be taken into account. 
The arithmetic mean of these absolute deviations may be taken as the 
required measure of dispersion, It is referred to as the mean deviation 
of x about A. Denoting this mean deviation by MD,, we have thus 


MD,="\x,~AI. ve (8.1) 


It can be shown that the mean deviation is least when measured 
about the (or a) median. A simple proof of the proposition is given 
below : 

Let the given values of x be arranged in an increasing order of 
magnitude, x, being the ith value in this order. Obviously, we have 
X¢4) Sq) eevee SX mye 

¥s(x)—15 


226 SUNDAMENTALS OF STATISTIOS 


Case 1. Letn be even and equal to 2m. The sum |x,,,—A|+ 
|%(2m)—A| will be least, having the value x;,9.)—%1) if A lies ony 
where between x,,) and x9). Similarly, [¥(2)—A] + |*em-1)—A| is 
least when A lies between x9) and xy); and so on. Lastly, 
[Xim A] + [tmp A] is least when A lies between žym) and xing 1): 

Each of these sums will be a minimum, and hence 


1 
MD =~ 21x94] 


will be a minimum, when xm) SA S tt Pen 

Case 2. Let n be odd and equal to 2m41. As in the previous 
case, |x,,—A|+|x¢oms4)—A] is least if A is taken in between žu) 
and xom We take sums of pairs in this way, the Jast of this type 
being |*im)—A]+-|x(m49)—4|, which is a minimum when A lies 
between xım) and Ximpoje Finally, |*(m41)—A| is smallest, i.e. equal 
to 0, if A=Xing.4): 

Hence here MD, will be a minimum when A=ningiy 

In either case, it is seen that A should be the (or a) median of x if 
MD, is to be a minimum. 

Owing to this result, it would seem Proper to use the (or a) median 
as origin in computing the mean deviation. In practice, however, the 
mean deviation is generally computed about the arithmetic mean. 

Example 8.1 Consider the data of Table 7.1. Here the median 
may be taken to be the arithmetic mean of the 6th and 7th values in 
the arrangement of the data in an ascending order of magnitude, viz. 
1,232 gm. and 1,278 gm. Thus the median is Mi=1,255 gm. The 
mean deviation about Mi is obtained with the help of Table 8.1. 

Thus the mean deviation about median is 


If, instead, we want to calculate the mean deviation obout the 
mean, then some simplication can be made. 
We may write 


Zir = (3x) ++ E(t), 


MEASURES OF DISPERSION 227 


Xi containing the values of x that are less than or equal to g and 
Ja the values of x that are greater than z. 


But 
2 («—#) =0, 
that is, 
X(n — z) + Ea) =0, 
so that 
Ei (3—7) = Za (1—3). 
Hence 


MD, =2 J, (3—2) [n=2 Fa(xr—z)n 


Therefore, it is enough to consider only one of the two sets of 
values of x in computing MD,. 


TABLE 8.1 


DETERMINATION OF MBAN Deviation ABOUT MEDIAN 
FOR THE Data or Taste 7.1 


Yield (gm.) |x= Mil 
ži 
1,216 39 
1,374 119 
1,167 88 
1,232 23 A 
1,407 152 
1,453 198 
1,202 53 
1,372 117 
1,278 23 
1,141 114 
1,221 34 
1,329 74 


Example 8.2 For the data of Table 7.1, z=1,282:67 gm. The 
values of x which are smaller than or equal to zare (in gm.) 1216 
1167, 1232, 1202, 1278, 1141 and 1221. The absolute deviations 
from % are (in gm.) : 


228 FUNDAMENTALS OF STATISTIOS 


66°67, 115-67, 50°67, 80°67, 4°67, 141 67 and 61-67, and their 
suum is 521-69 gm. 

Hence MD,=2 x 521°69/12=86-95 gm. 
8.4 Standard deviation 

In considering the deviations x,—A for obtaining a measure of 
dispersion, we may get rid of their signs (thus taking their magnitude 
only into account) by taking, instead of their absolute values |x;—A|, 
their squares (x,;—A)*. Like the absolute values, these squares will 
also reflect the dispersion of the variable about A. The positive 
square-root of the arithmetic mvan of these quantities, i.e. 


[20-41% ve (8.2) 


which is called the root-mean-square deviation about A, may then be 
accepted as a measure of dispersion alternative to the mean deviation. 
The square-root is taken in order to express the measure in the same 
units as those of x. 


(8.2) is least when A=3, as is evident from the fact that 
Z(%—A)*= Fila) + (2—4) 
l =F (u3) +2(1—A) F(x) al- A, 
=F(n—3)*+n(3—4)? 
[since Z(%—3)=0, from (7.3)) 
> Zli—2)*, 
the equality sign being valid when and only when A =z. 
The measure of dispersion obtained by putting z for A in (8.2) 


is called the standard deviation of x and is denoted by s (or sl). We 
have, therefore, 


s=4/: 2 (*;—#)%. w (8.3) 


For computational purposes (8.3) may be expressed in a simpler 
form. We have 


Zla) = Zrt tnr? 


= Za — ns, since J x,=nz. 


i I i 


MEASURES OF DISPERSION 229 


Hence 


=y. l Fate. ve (8:4) 


For grouped data, the standard deviation is given by 


s=y } Èlia k ae 48.5) 


the symbols having the same significance as before. ‘This becomes, 
on simplification, 


ssy Eat few vs (8.6) 


When the values of x are very large in magnitude, the computa- 
tion of s by means of the above formula becomes laborious, as the 
squaring process yields still larger values. Short-cut methods of 
computing s will be discussed in Chapter 9. 

Some properties of the standard deviation are given below : 


(a) Suppose that the given values of x are all equal : x;=a, say, 


forit oyna ,n. Then ¥=>x,/n=a, so that for each i, x;—-%=0. 
i 
Hence } 
s=a/ 1 xnx0=0. e (8.2) 


Thus the s.d. of a variable whose values are all equal must be zero. 
The converse also is true, as the reader can see for himself. 


(b) Lety be equal to a+bx; then y=a+6x. Hence 
Jı—J=b(x;—7) for each i. 
It follows that 
a TE 
Dividing both sides by n and taking positive square-roots, one has 
5,=1b|s,, w+ (8.8) 


where the symbols s, and s, denote the s.d.s of x and y, respectively, 
(c) Let there be two sets of values of x with n, and n, values, 


230 FUNDAMENTALS OF STATISTICS 


and let %,, % be their means and s, sẹ their s.d.s. Then the s.d. of x 
for the two sets pooled together can be expressed in terms of n,, np, 
Ži» p and s}, sa By (7.5), the grand mean of x is 


gamit nofe 
ny+ng 


Denoting by x,; (j=1, 2,...... s ni) and, by iy (j=1, 2,...... s na) 
the values in the two sets, we may write the sum of squares of the 
deviations of the values from z as 


(m-+m)st= Za) ley) 
where s? is the total variance of x. But 
a a (%,—2)}? 


= Bley ¥)* +m (3—3) 


=n s +n (3—2). 
‘Similarly, 
2 (sya) =m s +n (3—3). 
Thus 
tM Sr Hg 59° ny (¥ — 3) +-Mmp(Zp—3)* at 18.9 
ni tna + ny +g a (8.9) 
Generally, if there be ¢ sets with ny, ng, ...... , n, Values, having 
means Xj, Fp) «+... » % and 8.d.s Syy 59, crees >» 5, then the s.d. of x for 


all the sets taken together is given by 
t ul 
— n 
ie ase f (2ni(3i—3) 
wm =n 
=1 ist 
where z is the grand mean of x, 


Example 8.3 The result in (b) above shows that the standard 
deviation remains unchanged under a change of origin, This often 
means a simplification in the computation of the standard deviation. 

This is illustrated here for the data of Table 7.1. 


, vs (8.10) 


MEASURES OF DISPERSION 231 


We are taking a new variable 
u=x—1,200, 
the origin 1,200 being chosen to make the values of u small enough. 


TABLE 8.2 
DETERMINATION oF S.D. ror THE Dara or Taste 7.1 


256 


1,374 30,276 
1,167 1,089 
1,232 1,024 
1,407 42,849 
1,453 64,009 
1,202 4 
1,372 29,584 
1,278 6,084 
1,141 3,481 
1,221 441 
1,329 ' 16,641 

u ii o leaks a OA Damn GRE a ONE 
Total 992 | 195,738 

Here 


ü= Fu,/12—=992/12—62:67 gm. 
seie pue 


1 
= 5738 — (82 67)? 
72% 195,738 —( ) 
=16,311-5—6,834-33 
=9,477:17 (gm.)*. 


Hence 
5,=V9,47 7-17 =97°35 gm. 


"232 FUNDAMENTALS OF STATISTICS 


8.5 Comparison of range, mean deviation and standard 
deviation 
All three measures of dispersion so far considered are rigidly 
defined. The range is, however, inferior in one respect : it becomes 
meaningless when either of the two limits of the variable happens 
to be infinite. 


The range is the easiest to compute. The other two require 
almost the same amount of computational labour. 

The significance of the range is easily comprehensible. The general 
nature of the mean deviation also is readily understood. The standard 
deviation, on the other hand, has a comparatively abstract nature. 

Both the mean deviation and the standard deviation are based on 
all the given values of the variable. And hence, properly speaking, 
they characterise the whole set of values. The range is inferior in this 
respect, being based only on the two extreme values of the set. In 
fact, it often fails to give a Proper idea of the dispersion. Suppose, 
for instance, that a variable assumes in one case the values 

25, 25, 25, 25, 60, 60, 60, 60 
and in another the values j 
25, 31, 32, 44, 53, 56, 59, 60. 

The range is the same in the two cases, viz. 60—25=35, but 
the dispersion cannot be said to be the same. For while in the first 
case four of the values are equal to 25 and four are equal to 60, in 
the second there is only one 25 and only one 60, the other values 
lying near about the average, The actual dispersion of the former 
set is, therefore, much larger than that of the latter. 

The standard deviation has certain desirable properties which 
make it easily amenable to algebraical treatment, No such properties 
are, however, possessed by the range or the mean deviation. 

It follows, therefore, that the standard deviation may be generally 
regarded as the best measure of dispersion, just as the arithmetic 
mean generally serves as the best measure of central tendency. In 
some particular cases the range is employed, instead of the standard 
deviation, because it is much easier to compute. ‘This is often done, 
for example, in statistical quality control, where the analysis of the data 
must be carried out immediately after they are collected in order 
that effective action may be taken on the basis of the analysis. 


MEASURES OF DISPERSION 233 


8.6 Measures based on mutual differences of observations 

Some statisticians suggest measures of dispersion that do not 
depend on any particular measure of central tendency, the choice 
of which in any case is bound to be more or less arbitrary. The 
notion of central tendency, they argue, is not relevant when we 
are looking for a measure of dispersion. Since dispersion really 
means the extent to which the given values of the variable differ 
from one another (rather than from any arbitrarily chosen average), 
any proper measure of dispersion, in their opinion, should be based 
solely on the mutual differences x;—x; (i, j=l, 2, «++ , n) of the - 
values of x. 

A measure of this type, suggested by C. Gini, is the mean of 
the absolute values of all n? mutual differences. Called Gini’s mean 
difference, it is thus given by the formula 


4-53 I EEA we eT 


=1 js1 


In case of grouped data, we have, in the usual notation, 


1)=1 


4-43 Sh fla—*l- ves (81a) 


However, this measure is, like a mean deviation, difficult to 
handle mathematically. The following measure, which is based on 
the squares of mutual differences of the observations, (or its positive 
square-root) is better in this respect : 


4,=2 % Sna ve (8.12) 


14 ¢4 
d= 2 fihia) ... (8.12a) 


The merit of such measures is apparent from what we have said 
at the beginning of this section. Even so, it is doubtful whether 
a measure other than the standard deviation is at all necessary. 
Indeed, the standard deviation itself has this property of being based 
solely on the differences of the observations among themselves. 


234 FUNDAMENTALS OF STATISTIOS 
For EE x) =Z 2l —%)—(xj;—3)}* 
a I 22 y— I E) tag (45— 2)? 


=2n%s?, 
so that 


suf. ves (8.13) 


We should rather say that here we have one more reason why the 
standard deviation should generally be regarded as the best measure 
of dispersion, 

8.7 Quartile deviation 

In the previous chapter, we defined the median as a value of the 
variable such that half of the given values are less than or equal to it 
and the remaining half are greater than or equal to it. The general 
name for a measure of this type is quantile. The quantile or Sractile 
of order p (or the p-quantile) is a value of the variable such that a 
proportion p of the total number of given values are less than or 
equal to it and a proportion (1—9) are greater than or equal to it. 
For a continuous variable this quantile (here denoted by z$) may be 
approximately determined by the formula 


syne tO xe, v (8.14) 


where x,=lower boundary of the class-interval in which z, lies, 
c=width of this class-interval, 
nj=cumulative frequency (of the ‘less-than’ type) correspon- 
ding to x, and 
So=frequency in this class, 

A frequency distribution may be briefly described by giving the 
values of some of its quantiles. Generally, it will be enough to give, 
together with the median, which is z,,., the quantiles Zin and Zg. 
These three, zija Zia aNd Zay taken together divide, as it were, the 
frequency distribution of the variable into four ecual parts. Hence 
they are also called the quartiles. z, is the first or lower quartile 
(denoted by Q,), the median is the second quartile (Mi=Q,), and 
29,4 the third or upper quartile (Q,). 


MEASURES OF DISPERSION 235 


The lower and upper quartiles provide us with another measure 
of dispersion. This measure is 


ga, l. (8:15) 


called the quartile deviation or semi-interquarlile range- 

The quartile deviation is rarely used, except when the computa- 
tion of the standard deviation is extremely difficult or impossible— 
for instance, when the observations are given in a frequency table 
with class-intervals of varying width or with one or both of the ` 
terminal classes undefined. 

Example 8.4 Consider the frequency distribution of Table 6.10 
Here n/4=4t25. Hence we find, on going through the comulative 
frequencies of the ‘less-than’ type, that the first quartile lies between 
x,=159-55 cm. and x, = 164-55 cm., for which the cumulative frequ- 
encies are n=28 and ny=86. Here c=5 and f,=58. Hence, from 


formula ( .14), 


9, =159554 48 xs 


— 159-55 -+ 1-401 160-951 cm. 
Similarly, since 


S132 75, 
Q,=164-554 227. 86.5 
= 164-55 + 3-896 =168 446 cm. 


The quartile deviation is, therefore, 
gu leette— 160-95! _ 9.748 cin. 


8.8 Measures of relative dispersion 

The measures of dispersion we have discussed above are all 
expressed in the same units as those of the variable. As such they 
cannot be used in comparing two distributions of different types with 
respect to their variability. A difficulty is encountered, for example, 
when we want to compare the dispersion of a set of heights (given in, 
say, cm.) with the dispersion of a set of weights (given in, say, kg.). 
For purposes of such comparison, therefore, a measure of dispersion 


236 FUNDAMENTALS OF STATISTICS 


has to be made free from the units of the variable. The simplest 
procedure is to express a measure of dispersion as a percentage of a 
measure of centra] tendency. The most commonly used measure of 
this type is the coefficient of variation, 


v=100 4, ws (8.16) 
x 


where z is supposed to be non-zero. 

Besides the above use, such measures serve another purpose. 
Suppose repeated measurements are being taken of two rods, one of 
length 10 cm. and another of length 100 cm, The means of the 
measurements will be 10 cm. and 100 cm. respectively, provided the 
measurements are free from bias. Let the standard deviation of each 
set of measurements be 2cm. Buta standard deviation of 2 cm. in 
the first case does not mean the same thing as a standard deviation 
of 2 cm. in the second. For the first set of measurements is then 
much less accurate than the second set of measurements. The 
Coefficient of variation, on the other hand, will give a true picture of 
their relative accuracy. Thus it may be useful even when we want 
to compare sets of data expressed in the same units. Other examples 
of this kind ave comparing the variability of weights of one-year-olds 
with that of ten-year-olds, the variability of diameter for ball bearings 
used in trucks with that for ball bearings used in bicycles, etc. 

8.9 Curve of concentration 

A special type of cumulative frequency graph, known as a curve of 
concentration or Lorenz curve, is useful in studying the concentration of 
wealth or income in relation to certain Segments of the population 
and in similar other situations, 

Let F(x) denote the percent cumulative frequency for the variable 
up to the value x and let P(x) denote the percent cumulative total 
for the variable up to the value x, Naturally, both F and Ø vary 
from 0 to 100, 

The curve obtained by plotting ® against F for different fixed 
values of x is known as the curve of concentration or Lorenz curve, 
It can be shown that the curve is necessarily concave upwards (as in 
Fig. 8.1). 

The line =F is called the line of equal distribution. Such a 
curve would indicate that any Specified proportion of persons would 


MEASURES OF DISPERSION 237 


have precisely the same proportion of total value. In the case of an 
income distribution, it would mean that 20% of persons would earn 
20%, of the income, 50% of persons would earn 50% of the income, 
75%, of persons would receive 75% of the income, and so on. 

The greater the departure of the Lorenz curve from the line of 
equal distribution, the higher is the concentration of the total value 
(say income) in a few individuals. Thus in a particular case, if we 
find that 50% of the persons receive only 20% of the total income, 
75% of the persons receive only 30%, 90% receive 50%, and so on, 
it means that there is a high degree of income concentration in a few 
individuals in the upper income groups. Thus the area between the 
line of equal distribution and the curve of concentration called the 
area of concentration, is an indicator of the degree of concentration ; 
the larger the area the more is the concentration. Twice the area 
is Gini’s coefficient of concentration. 

The Lorenz curve for the income-distribution of Table 8.3 has 
been drawn in Fig. 8.1, The calculations are also shown in the table. 


TABLE 8.3 
CALOULATIONS FOR DRAWING THE LORENZ CURVE FOR 
tam Data or Exercise 7.20 


Monthly 
income 


(Rs.) 


150—200| 175 114 248 | 19,950| 34,050] 42:76 27°64 
200—250| 225 195 443 | 43,875 | 77,925| 76°38 63-25 


250—300 275 63 506 17,325 | 95,250 87:24 77-31 
300—350 | 325 32 538 | 10,400 | 105,650 | 92:76 85°75 
350—400| 375 20 558 7,500 | 113,150 | 96:20 91-84 
400—450 | 425 u 569 4,675 | 117,825) 98:10 95°64 
450—500| 475 8 577 3,800 | 121,625] 99°48 98-72 


500— 525 3 580 | 1,575 | 123,200] 10000 | 100-00 


lee i iC. 


238 FUNDAMENTALS OF STATISTIOS 


PERCENT CUMULATIVE TOTAL INCOME 


° © æ % 4 3 o P © © 100 


PERCENT CUMULATIVE FREQUENCY 


Fig. 8.1 The curve of concentration and the area of 
concentration for the data of Exercise 7.20, 


Questions and exercises 


81 What is dispersion? What are the common measures of 
dispersion ? 

8.2 How, in your opinion, should a measure of dispersion 
change when all values of the variable are increased or decreased 

(a) by the same amount ? 
‘ (b) in the same proportion ? 

Judge in this light the different measures of dispersion considered 
in this chapter, i 

3.3 Compare the range, the mean deviation and the standard 
deviation as measures of dispersion, 

84 Define standard deviation. 


State and prove its important 
properties. 


8.5 What is meant by relative dispersion ? Define coefficient 
of variation and explain its uses. 

86 Obtain the mean and standard deviation of the first n 
* natural numbers, Ans, (n+1)/2; Vii. 


MEASURES OF DISPERSION 239 


8.7 Prove that for any set of values, x,, xa +--+ SR 
atx Pees tat > ie a 


8.8 Show that the mean deviation about mean cannot exceed 
the standard deviation. When are the two equal ? 
89 Suppose the given values of x (viz. xp for i=l; 2, ..-... n) 
are such that a < x; < b for each i. : 
Show that (i) a<#<b and (ii) 0<s* < (b—a)?/4. 
(Hint: For (ii), use the result 
a+b)? 


Zias ga) ap) Ha) 


where for each i, x; is included in J, or Jẹ according as x; < (a+b)/2 
or x; > (a+6)/2, so that in either case h-E *<40—0)] 
8.10 Let sand R be, respectively, the standard deviation and 
range of a set of n values of x. Show that 
R? R? 
ArT A: 
When do the equalities hold ? 
8.11 (a) Show that relation (8.9) can be expressed in the alter- 
native form 


P ba nins (z _5.)2, 
3 ntng tatni” Oia: 


(b) In a batch of 10 children, the 1.Q.. of a dull boy is 36 
below the average 1.Q. of the other children. Show that the s.d. of 
1.Q. for all the children cannot be less than 10-8. If this standard 
deviation is actually 11-4, determine what the standard deviation 
will be when the dull boy is left out. Ans, 3-9. 

8.12 Let x be a variable assuming the values i=', 2, ...... ak 
with frequencies f; and let Fy be the corresponding cumulative 
frequencies of the ‘greater than’ type, while F;” are the cumulative 
totals of the ‘greater-than’ type of these cumulative frequencies. If 
n be the total frequency, and 

1g ayt 
nm 2E da ZE ; 
show that #=2T,— TT, 
(vide Exercise 7.8). 


240 FUNDAMENTALS OF STATISTIOS 


8.13 (Continuation) Let there be & class-intervals in a frequency 
table, each of width c, If f; be the frequency for the ith class and x; 
the corresponding class-mark, and if 


¿ poe 1 pa 
Fi= Df Fi =F, =- Fp S=- Fi’, 
j=1 jai ni ni 
n being the total frequency, show that 


=x, —c(S,—1) 
s*=c?(25,—S, —S,*). 


| Hints : #=i g(a —c) fitxp+e 


and tay (x; -n0t as (ma) fi)’.] 
614 If 
Ia=(%1—%)/V2, 
Ja= (*1+%.—2x5)//6, 


In=[Xy+4,4...... +n (n—1)x,]/Vn(n—1), 
show that 


nst= > yè 
i=2 


815 Suppose that the variable x takes positive values only and 
that the deviations x;— are small compared to x. Show that in 
such a case 

s? 
(a) e E and (b) safii). 


8.16 (a). Show that the mean deviation MD, may be obtained 
by the formula 


nMD,=S,—S,+A(n,—n,), 
where S, is the sum of the values that are less than A and n, is the 


number of such values, while S$, is the sum of the values that are 
greater than A and ng is the number of such values, 


(b) Hence supply an alternative proof for the result that MD 
is a minimum when 4 is a median. i 


8.17 Determine the range, the mean deviation about mean and 
the standard deviation for the data of Exercise 7.16. 


Ans, 3-3 cm. ; 0-7) cm. ; 0-91 om. 


MEASURES OF DISPERSION 241 


8.18 For the frequency distribution of Exercise 7.18, compute the 

mean deviation about median and the standard deviation, £ 
Ans. 1494; 1-858. 

8.19 Evaluate the three quartiles for the frequency distribution 
of Exercise 7.19. Next determine the mean deviation about median, 
the standard deviation and the quartile deviation. 

Ans. 97-08, 10841, 118-33 ; 13-36, 17-26, 10-63. 

8.20 Compute the standard deviation of the age-distribution of 
Bengali males given in Exereise 7.21 by using formula (8.10). 

‘Ans. 20°39 yr. 

8.21 For a set of 250 observations on a certain variable x, 
the mean and standard deviation are, respectively, 65'7 and 4-4. 
However, on scrutinising the data it is found that two observations, 
which should correctly read as 71 and 83, had been wrongly recorded 
as 91 and 80. Obtain the correct values of the mean and the 
standard deviation. Ans. %=65°6, s=42. 

Dat: Use formule (7.5) and (8.10).] 

8.22 The number of runs scored by cricketers A and B during a 
test series consisting of 5 test matches is shown below for each of the 
10 innings : 

Cricketer A—5, 26, 97, 76, 112, 89, 6, 108, 24, 16. 
Cricketer B—51, 47, 36, 60, 58, 39, 44, 42, 71, 50. 
Make a comparative study of their batting performance. 


SUGGESTED READING 


[1] Mills, F.C. Statistical Methods (Ch. 5). H. Holt, 1955. 

[2] Wallis, W. A. and Roberts, H. V. Statistics : a New Approach 
(Ch. 8). Methuen, 1957. 

[3] Yule, G. U. and Kendall, M. G. Introduction to the Theory of 
Statistics (Ch. 6). Charles Griffin, 1953. 


ys (t)—16 


9 MOMENTS AND MEASURES OF 
SKEWNESS AND KURTOSIS 


9.1 Moments 
In the two preceding chapters, we considered the mean, 


AOLE eE k 
p ai or a5 ati fis 
and the variance, 
he 1 k 
ae a BEET Sa —%)? 
s PAG %)? or s at %)*fiy 


as possible measures of the central tendency and dispersion of the 
variable x. 
The most general measure of ‘this type is 


ay TE ee esi sz (941) 
Ni=1 


which is called the rth moment of x about the origin A. If tne given 
values are classified into a frequency table, the formula takes the 
form 


Prey oa 
m= 24) fo aw (9.la) 


x; being the class-mark of the ith class and f; its frequency. 

When the origin of a moment is taken at the arithmetic mean of 
the variable, it is called a central moment. Thus the rth central 
moment, denoted by m,, is 


m, = Bea" } 
rer ws (9.2) 
or m = Zi-*) Ss 
=1 


according as the values are ungrouped or are grouped into a 
frequency table. 
Evidently, we have always 


m,=myo=1 and m,=0. 
The mean of a variable is ‘its first moment about 0, while the 
variance is the second central moment. 
242 


MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS 243 


9.2 Central moments expressed in terms of moments about 
an arbitrary origin 
When mis my, seee }m,—the moments höet an arbitrary origin 
A—are given, the central moments up to that of the rth order can be 
obtained by using certain algebraic relations connecting the two sets 
of moments. These are deduced below : 
We have 


(aj {=A Gay 
=(1—4)' — (5) iyi) 


+ (eA EA). 
(=r, S TEE aA 


+D h 7 5) (ee Aye) (=1" ea. 
Summing both sides for all i from l to n, and dividing by n, 
we get 


m, =m, —(;) m, {my + (5), sm ;2— pret 
+(- a aE a a) aa 


ES Sa (jnis we (9.8) 
since É 
2—-4=i Š(eA)=m, ve (9:4) 


It is easily seen that relation (9.3) holds for moments obtained 
from grouped data as well. 

Some particular cases of (9.3) are : 

m,=m,—m;{=0, 

m,=my —2m\m; -+-m,?=mj—m,", 

m,=m;— 3mm; +3m;m;*—m;"=m;— 3mm; +2m)3, >... (9.5) 

my =m; —4mgm, -+-6mọm,*—4m;m;*+m;* 

=m; —4m;m;+6mgm;*—3m;'. J 


In most practical cases, it will be sufficient to calculate z, my, m 


244 FUNDAM'NTALS OF STATISTICS 


and m, These computations are greatly facilitated by first computing 
moments about a suitably chosen origin and then using relations 
(9.4) aad (9.5). This will be evident from the following examples : 

Example 9.1 Consider once again the data of Table 7.1, represent- 
ing yields in gm. of 12 tomato plants. Here the values are quite 
large, so that the direct computation of the first four moments will 
be extremely laborious, as this will require obtaining squares, cubes 
and fourth powers of the given quantities and finding their totals. 
The computational labour can be reduced a great deal by first taking 
deviations of the given values about a suitable origin and then 
computing moments about that origin. 

In the present case, we may take the origin at 1,300 gm. The 
different steps in the calculation of moments about 1,300 gm. are 
indicated in the following table : 


TABLE 9.1 
CALOULATION or Moments ror THE Data or TABLE 7.1 


Yield (gm.) | 


x u;=x;— 1,300 | uj? | uj? uf 
1,216 —84 7,056 —592,704 49,787,136 
1,374 74 5,476 405,224 29,986,576 
1,167 —133 17,689 —2,352,637 312,900,721 
1,282 | —68 4,624 —314,432 21,381,376 
1,407 | 107 11,449 1,225,043 131,079,601 
1,453 153 23,409 3,581,577 547,981,281 
1,202 —98 9,604 —941,192 92,236,816 
1,372 72 5,184 373,284 26,873,856 
1,278 <22 484 — 10,648 234,256 
1,141 —159 25,281 —4,019,679 639,128,961 
1,221 —79 6,241 — 493,039 38,950,081 
1,329 29 841 24,387 707,281 


Total | —208 117,338 —3,114,850 1,891,247,942 


MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS 245 


From this table, one gets 
m; =x; X (—208) = — 17-3333 gm., 
my =y X 117,338 =9, 778-1667 (gm.)?, 
m=% X (—3,114,850) = —259, 570-8333 (gm.)? 
and m; =x% X 1,891,247,942=157,603,995-1667 (gm.)4, 


Hence the mean, the variance (and the s.d.) and the third and 
fourth central moments will be as follows : 


#=1,300-4+m/=1,282-67 gm., 


m,=m_—m,* 
=9,778-1667 — 300-4433 =9,477-7234 (gm.)?, 
so that s=V9,477-7234 =97-35 gm., 


Ms3=m3—3mym, -+-2m{3 
=—259,570-8333+508,463-6906—1 041-5348 
=247,851-32 (gm)? 

and my=m,—4mgm, +6mgm{?—3m {4 
=157,603,995-1667—17,996,876-4994 
+17,626,708 0278 —270,798+5295 
=156,963,028-17 (gm.)4. 

The values of z and s computed by this short-cut method may be 
compared with those obtained earlier directly (in Example 7.1 and 
Example 8.3). The two sets of values are, to all intents and purposes, 
identical. The slight difference between the two values of the 
standard deviation is to be attributed to errors of approximation. 

In the case of grouped data, simplifications can be made in the 
calculation of moments if the deviations about the chosen origin 
are divided by a suitable number. The class-width may be used for 
this purpose when the class-intervals are equally wide. 

To guard against computational mistakes one may also use some 
checks which are due to Charlier. For a frequency distribution, 
these checks are based on the following relations : 

We have 34+!) Si=Suifitn- This provides a check on the 


value of Bute 


246 FUNDAMENTALS OF STATISTIOS 


Again, (utl) =X ui ft+2D wftn. This can be used te 
check the Calves of Su; fi and Luz fo 
And so on. 
Example 9.2 Let us take for illustration the data on the heights 
of Indian adult males shown in Table 6.10. 
. TABLE 9.2 
CALOULATION OF Moments FOR THE Data or Taste 6.10 


Class-mark 


fe ea ase amt 162-08 | tifi vfi | uf | ufi luty 
147-05 
152-05 
157-05 
162°05 
167/05 
17205 27 2 54 108 216 | 432 | 2,187 
17705 2 3 6 18 54 | 162 512 
182°05 2 4 8 i 32 128 | 512 1,250 
Total 177 | SEN 95 | 263 | 383 fns] 4,986 
Here ut 1)* fA=4,£86, 
while Zulfi +4 Zul tO Rut fitu fi+n 


=1,319+-1,532+1,578-+ 380-+ 177 = 4,986, 


The two values being equal, the computations may be supposed 
to be free from errors. 
Now, the raw moments are 


m= (!Zuf) x5 


DETE (39 x95) x5=0:53672x5 cm., 


MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS 247 
he S 
mi=( 2u] fi) x5 
=n x 268) x 5°=1-48598 x 5? (cm.)?, 
1 
fas |- Fu? 3 
mM; (zas) x5 
=( WX 383) x53=2:16384 x 53 (cm.)? 
and PE (zas) iy Be 
4 AS i 
= ln x 1,319) x 54=7-45198 x 54 (cm.)#, 
We have, therefore, 
¥=162-05 + mj =162-05+-0-53672 x5 
= 162-05 +-2:6836 = 164-734 cm., 
m,=mg—m;* 
= (1:48588 —0-28807) x 5? 
=1+1978] x 5%= 29-945 (cm.)*, 
so that s=V29°945=5:472 cm, 
ms=m3 —3mgm;-+-2m ed 
= (2: 16384 —2-39250-+0'30923) x 58 
=0:08057 x 5°=10-071 (cm.)?, 
and m,=m, —4mgmj{ +6mgm;?—3m;* 
-= (745198 —4-64550-+-2:56822—0-24895) x5 
=5:12575 x 54==3,203-594 (cm.)4, 
The values of ¥ and s may also be obtained by using formule 


(7.2) and (8.6). The two sets of values will be identical for all 
practical purposes. 

It is obvious from the above examples that for computational 
convenience the origin should be taken somewhere at the middle of 
the range of the given values of the variable. Secondly, it should be 
a round number (a class-mark in the case of grouped data), so that 
the deviations about it can be readily obtained. 


248 FUNDAMENTALS OF STATISTICS 


9.3 Moments about an arbitrary origin expressed in terms of 
central moments 
Just as a central moment can be expressed in terms of moments 
about an arbitrary origin, so a moment about an arbitrary origin is 
expressible in terms of central moments. 


We have 
(xj—A)’ ={(x;—-%) + (2—4)}" 
={(4 3) +4}", 
where d=ł— Å. 
Thus 


(%j—A)" = (x;—8)" + (i) (x;—2)" -d4 (3) (x8) dh.. 
+e pa. 


Taking the sum of each side for all i from 1 to n, and dividing 
by n, we get 


m;=m, + (;) m--@+ (5) m-s + iak 


+ Ee 1) mat (lar, w+ (9.6) 


The last term but one is, of course, zero since m,=0. 
In particular, 


m,=d, 
PSCS | (9.7) 
m; =m,+3md +d, | 

and mg =m,+4m,;d-+6m,d?+d*. 


9.4 Sheppard’s corrections for moments 


In computing moments for data grouped into class-intervals by 
means of the formula 


i 


m=! 3 (s—A)! fiand m, =1F(4—8)" fo 


we are acting as if the observations falling in a class (e.g. the f; values 
falling in the ith class) were all equal to the class-mark, although the 
observations may be really unequal. The assumption naturally 
introduces some errors, which are called the errors due to grouping. To 


MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS 249 


correct for these grouping errors, the computed values of the 
moments have to be suitably adjusted. A method for adjusting the 
moments for grouped data where the classes are equally wide has 
been developed by Sheppard. Sheppard’s corrections for moments 
about an arbitrary origin and for central moments of the first four 
orders are given below : 

m; (corrected) =m, 


wal (corrected) =m;—4, 


mz (corrected) =m;—m;, f + (9.8) 
bg 7 l 
mg (corrected)=m; =g" ta“ j 
and ? 
m, (corrected) = -f > 
mg (corrected) =ms, wv (9.9) 
Gs ti 
m, (corrected)m,— m= + 555 ADE 


where ¢ is the width of each class-interval. 

These corrections will be valid only if certain conditions are 
fulfilled. First, it is necessary that the observations should relate to 
a continuous variable. Second, the frequency curve of the distribu- 
tion should be continuous and should have high-order contact at 
each end of the range of the variable. The second condition 
cannot, of course, be verified when only a finite number of obser- 
vations are available, which is usually the case. A fair indication 
may, however, be obtained from a frequency table of the data, 
provided a sufficiently large number of observations are given. Thus, 
if the class-frequencies decrease towards the two ends of the range 
smoothly and gradually (as in Table 6.10 or Table 9.3), the 
condition may be supposed to be fulfilled. 

Two other conditions should be satisfied by the observed data in 
order that the use of Sheppard’s corrections may be worth while. It 
is, in the first place, desirable that the total frequency should be 
sufficiently large ; otherwise, the moments will be more affected 
by sampling errors (vide Chapter 15) than by grouping errors. 


250 FUNDAMENTALS OF STATISTIOS 


Furthermore, the width of the class-intervals should not be too small 
compared to’ the range of variation of the data—in other words, the 
number of classes should not be too large ; for otherwise Sheppard’s 
corrections will make little difference to the uncorrected values of the 
moments. As a general rule, these corrections should not be applied 
unless the total frequency is higher than 1,000 and the number of 
Classes is smaller than 20. 

The use of Sheppard’s corrections is illustrated in the following 
example ; 

Example 9.3 During a crop-cutting survey on rice in a State of 
India, 1,175 cuts (each of size | /3,200th of an acre) were taken. 
The yield of rice per cut for the 1,175 cuts is shown in table 9.3 in 
the form of a frequency table. 


TABLE 9.3 


Frrquenoy Disrriecrion or Rioz Yisup For 1,175 
Curs or Size 1/3,200rH or an AorE 


Yield of rice (md.) 


class-mark Pra oe, 
EY oi 
nn a ss 
1°25 5 
3°75 37 
625 68 
875 131 
11-25 142 
13°75 158 
16°25 186 
18°75 171 
21-25 99 
23°75 75 
26°25 35 
28°75 27 
31:25 18 


33°75 12 


MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS 251 


For this distribution, the uncorrected moments are found to be: 
¥=15-8989 md., n 
m,=45-9139 (md.)?, 
m3= 170-6747 (md.)8 
and my=7,272-9613 (md.)*, 


Here we may apply Sheppard’s corrections, because the given 
figures are such that the four conditions noted above may be taken 
to be fulfilled. We then have 


¥ (corrected) =15-899 md., 


my (corrected) =45-9139 oe 


EAE dae (md.)?, 
so that s (corrected) = /45:393—6-737 md., 
mg (corrected) =170-675 (md.)? 


and m, (corrected) =7,272: 2613—} x (2'5)* x 45° 91394575 40 * (2:3) 


=7,272-9613— 143-4809 41-1393 
=7,130-620 (md.)*, 


9.5 Skewness 

By skewness of a frequency distribution we mean the degree 
of its departure from symmetry. The frequency distribution of a 
discrete variable x is called symmetrical about the value xq if the 
frequency of x,—h is the same as the frequency of x»+4-h, whatever h 


vRtQquency 
FALQUENCY + OCNSITY 


VARIABLE VARIAGLE 


Fig. 9.la Asymmetrical dietritution Fig. 9.1b A symmetrical distribution 
(discrete variable). (continuous variable), 


252 FUNDAMENTALS OF STATISTICS 


may be. In the case of a continuous variable, the term ‘symmetry’ 
should be used in relation to its frequency curve (vide Section 7.5). 
The frequency curve of a continuous variable is said to be symmetri- 
cal about x, if the frequency-density at x,—A is the same as the 
frequency-density at xọ+h, whatever h may be. Figures 9.la and 
9.1b show two symmetrical distribution. 

A distribution which is not symmetrical is called asymmetrical or 
skew. This skewness is said to be positive if the longer tail of the 
distribution is towards the higher values of the variable Fig. 9.2a), 
and negative if the longer tai! is towards the lower values of the 
variable (Fig. 9.2b). 


> > 
a š 
z 
; $ 
‘ 
» > 
f 3 
Hy : 
VARIABLE VARIABLE 
Fig. 9.2a A positively skew Fig. 9.2b A negatively skew 
distribution. distribution. 


An important point to be noted in this connection is that all 
odd-order central moments are zero for a symmetrical distribution, 
Pesitive for a positively skew distribution and Tegative for a 
negatively skew distribution. Any such moment may, therefore, be 
considered a measure of the skewness: of a distribution except, of 
course, m, which is necessarily zero for any distr ibution—symmetrical 
or otherwise. The simplest of these measures is m, To make this 
measure free from the units of the variable, we divide it by s* and 
thus get an absolute méasure* : 


any (9.10) 


An alternative measure of skeweness is obtained from the relative 


positions of the mean and the mode in a distribution. 


* This and the two subsequent measures involve the 
*he assumption that x is not a constant in the given set Mohn that s>0 (i.e, 


MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS 253 


In a symmetrical distribution, the mean, median and mode 
(assuming the distribution to be unimodal) coincide. If the distri- 
bution is positively skew, then 


mean > median > mode, 
and if it is negatively skew, then 
mean < median < mode. 


Hence the difference (mean— mode), divided by the s.d., is taken 
as a measure of skewness : 


Skat Me, we (9.11) 


Since it is difficult to estimate the mode from a frequency distri- 
bution, the empirical relation (7.9) is used to get another measure of 
skewness, viz. 


Stan), ves (9.12), 


A fourth measure of skewness is obtained by considering the 
relative positions of the three quartiles of a frequency distribution. 
For a symmetrical distribution the lower and upper quartiles are 
equidistant from the median ; for a positively skew distribution the 
lower quartile is nearer the median than the upper quartile is, while 
for a negatively skew distribution the upper quartile is nearer. 

Thus (Q,—Mi)—(Mi—Q,) may be taken as a measure of 
skewness. It is expressed as a pure number on being divided by 


(Qs—Mi) + (Mi—Q1)=Qi— Qi =2Q, 


which is assumed‘to be non-zero. Thus the new measure is 
=(Qs—Mi) — (Mi- Qi) we (9 
Sks 20, K (9.13) 


The measure given by equation (9.10) can theoretically assume 
any value between — œ and œ, but in practice its numerical value is 
rarely very high. The measure (9.12) can vary between —3 and 3. 
The same may be said to be approximately the case with (9.11) 
because of the empirical relation (7.9), which is valid for moderately 
skew distributions. As regards (9.13), it has the limits —1 and 1. 

‘ 


254 FUNDAMENTALS OF STATISTICS 


Example 9.4 We may calculate the various measures of skewness 
the frequency distribution of Table 6.10. It has been found already 
that for this distribution 


¥ =164-734 cm., ` Mo =164:836 cm., 
s Mi=164-758 cm., Q,=160°951 cm., 
and Q a=168-445 cm., while s=5:472 cm. 


According to formula (9.11), 
Sk=7 -M0 —0:102/5-472=—0:019. 


s 
Formula (9.12) gives 


Sk= $M) _ 0072/5472 = —0-013. 
Again, from formula (9.13), 
Skaz(Qs—Mi)— (Mi—Q,) 
Q3-Qi 
= —0'119/7:495=—0-016. 
Now consider the other measure of skewness, viz. (9.10). For the 


present distribution, m,—0-08057 x 5% (cm.)? and mg=1-19781 x5? 
(cm.)?. Hence hi 


L m, _ 008057 
AT n (1-19781)92 
and log g,—log 008957 —3 x log 1-19781 


=2-7885916=log0-06146. Thus g,=0°061. 
Thus all the measures are nearly equal to zero, indicating that 
the distribution is almost symmetrical. 
Example 9.5 The frequency distribution of Table 9.3 may now 
be considered. Here 
my = 170-6747 (md.)? and m,=45-3931 (md.)*, 


-so that a= ati as 1 
and log g,=log 173-6747 —3 x log 45-3931 
=2-2321699—3 x 1-6569899 
! =1'7466851 =log 0:5581., 


Thus g,=0°558, indicating that the distribution’ is positively 
skew. This is also apparent from the table itself. 


MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS 255 


9.6 Kurtosis 
Another method of describing a frequency distribution is to 
specify its degree of peakedness or kurtosis. Two distributions may 
have the same mean and the same standard deviation and may be 
equally skew, but one of them may be more peaked than the other. 
This feature of the frequency distribution is measured by* 


a=—3. vs (9.14) 


Obviously, it is a pure number. For a normal distribution, to 
be discussed in the next chapter, this measure has the value zero. 
A positive value of g, indicates that the distribution has high con- 
centration of yalues near the central tendency and has high tails, 
in comparison with a normal distribution with the same standard 
deviation In the same way, a negative value of gẹ means that the 


E 
3 
VARIABLE 
(a) 
g : 
j 3 
VARIABLE 
we) © 


Fig. 9.3 Three symmetrical distributions with different degrees of 
parir (a) seapiertio, (b) leptokurtic (c) platykurtic. 


distribution has low concentration of values in the neighbourhood 
of the central tendency and low tails, compared to a normal 
distribution with the same standard deviation. A normal curve is 


* This again involves the assumption that s>0. 


256 FUNDAMENTALS OF STATISTIOS 


said to be mesokurtic (ie, having medium kurtosis), A distribution 
with positive gẹ, is called leptokurtic, and one with negative g, is 
known as platykurtic. 

The quantities g,* and g,+3 themselves are sometimes used as 
measures of skewness and kurtosis, respectively. They are referred 
to as the b, and b, coefficients. Thus 


a= ves (9.15) 
and y=, (9.16) 


That the fourth central moment (m,) may be used in measuring 
kurtosis becomes obvious from the ‘fact that the higher the kurtosis, 
the higher will be the effect of the large deviations (from the mean) 
in the tails when raised to the fourth power. Division of m, by s$ 
makes the measure a pure number, 

Actually, however, bg (or gy) will be appropriate as a measure of 
kurtosis or peakedness only if we confine our attention to the class 
of the usual bell-shaped (or unimodal) distributions. Otherwise, it 
may only serve to distinguish a unimodal distribution from a 
bimodal. This will be apparent from Exercise 9.3. 

Example 9.6 Consider once again the frequency distribution of 
height for Indian adult males given in Table 6.10.. Here 


mg==9"12575 x 54 (cm.)4 
aud mg=1-19781 x5? (cm.)?, 
Therefore, 


ba= 74 =5:12575/(1-19781 the 
2 


and log 6, =log 5-12575—2 x log 1+19781 
Thus 
b,=3-573 and g,=0-573, 
indicating that the distribution is slightly leptokurtic. 


Example 9.7 For the frequency distribution of Table 9-3, 
m,=7,130-620 (md.)* and m,=45-393 (md.)?. 


MOMENTS AND MEASURES OF SKEWNESS AND KURTOSIS 257 


Hence 
b, = M4 _,7130-620 
m? (45-393)? 
and log b= log 7130-620—2 x log 45-393 
==3:'8531273—2 x 1-6569889 
=0°5391495 = log 3-4606, 
Thus 
bg==3'461 and g,=0-461, 
which indicate that this distribution also is slightly leptokurtic. 


Questions and exercises 


9.1 Define the moments of a frequency distribution and explain 
their usefulness in describing the location and shape of the frequency 
distribution. s 

9.2 What are skewness and kurtosis? Give some suitable 
measures for skewness and kurtosis. 

9.3 Using Cauchy-Schwarz inequality, or otherwise, prove that 

(i) b>l and (ii) 6.—6,-1>0. 

Discuss in detail the cases where 5,=1 and 6,—b, -1=0. 

9.4 Suppose the values of a variable are grouped into a frequ- 
ency table, the width of each class being less than one-third of the 
standard deviation. Show that Sheppard’s correction in such a case 
will make little difference (a difference of less than 0°5%) in the 
uncorrected value of the standard deviation. 


k 
9.5 Let s=3l4 7A] J: be the sum of absolute deviations about 


the mid-point A of the class in which our chosen average m lies, If 
d=m— Á, show that the mean deviation about m will be 


l h 

[tdm iS) 
where ng=frequency im the class contaiiing m, A=width of the 
class-interval, n,==total frequency in all lower classes and fy=total ` 


frequency in all higher classes, provided we assume a uniform 
distribution of frequency in each class. 


¥s (r)—17 


258 FUNDAMENTALS OF STATISTIOS 


9.6 Show that the measure of skewness given by (9.12) must lie 
between —3 and 3 and that the measure given by (9.13) must lic 
between —1 and 1. 


9.7 Prove, by a geometrical argument, that for a J-shaped 
distribution with its longer tail towards the higher values of the 
variable, the median is nearer to the first quartile than to the third. 
(A similar argument can be used to show that for the other type of 
J-shaped distribution, the median is nearer to the third quartile 
than to the first.) 


9.8 Consider any symmetrical frequency distribution for a dis- 
crete variable. Show that its central moments of odd orders must 
all be zero, 


9.9 Compute 7, s, my and m, for the data on length of earhead 
given in Exercise 7.16. 


Ans. 9-9; 0-91; —0-061 ; 1-56 (in proper units). 
9.10 For the frequency distribution of Exercise 7.18, compute the 
mean and the central moments up to that of the fourth order. 
Ans. 3°763 ; 3-454; —1-736 : 27:236. 
9.11 Determine the mean and the central moments up to that 
of the fourth order for the frequency distribution of Exercise 7.19. 
For the same distribution compute the various measures of skewness 
and kurtosis. Partial ans. %=108-481 ; mg=297-748 ; 
m,=99°823 ; m,=375,193 6. 
9.12 Consider the income-distribution of Exercise 7.20 and measure 
its skewness by means of an appropriate formula. 
Ans. Sk=—0-202, by formula (9.13). 
9.13 The scores in English of 250 candidates appearing at an 
examination have 
mean=39-72, my=97°80, my=—11418 and m,=28,396-14, 

It is later found on scrutiny that the score 61 of a candidate has 
been wrongly recorded as 51. Make necessary corrections in the 
given values of the mean and the central moments. 

Ans. Correct values are 39-76 ; 99-10; —93-27; 29,165-60. 


MOMENTS AND MEASURES OF SKEWNESS AND KuRTOSIS 259 


9.14 Particulars relating to the monthly wage distributions of two 
manufacturing firms are given below : 


Firm A Firm B 
Mean wage Rs. 377 Rs. 385 
Median wage Rs: 371 Rs. 362 
Modal wage Rs. 360 Rs. 351 
Quartiles Rs. 362 and 278 Rs. 358 and 390 
Standard deviation Rs. 32 Rs. 39 


Compare the two distributions. 


SUGGESTED READING 


[1] Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Part I 
(Ch. 7). Van Nostrand, 1954, and Affiliated East-West Press. 

(2] Mills, F. C. Statistical Methods (Ch. 5). H. Holt, 1955. 

{3] Yule, G, U. and Kendall, M. G. Introduction to the Theory of 
Statistics (Ch. 6). Charles Griffin, 1953. 


l 0 UNIVARIATE THEORETICAL 
DISTRIBUTIONS 


10.1 Introduction 

Every statistical enquiry is designed to gather information about 
some aggregate of individuals—individual objects or beings—rather 
than about the individuals themselves. In statistical Janguage, such 
an aggregate is called a population or universe. Thus, for example, 
the enquiry may relate to the human Population of India, the 
population of tigers in the Sunderbans, the population of mini-buses 
in Calcutta, the population of products turned out by a machine or 
even the population ofall throws that may possibly be made with the 
25-paise coin in my pocket. In most situatior~, the population may 
be considered infinitely large. This will obviously be the case for the 
population of possible throws that may be made with the 25-paise 
coin. Even when the enquiry concerns such things as the products 
of a machine, the underlying population may be regarded as infinite. 
For the performance of the machine is to be judged in the light of 
not only the articles actually produced by the machine, but also the 
articles that it might produce in the past (but did not) and also the 
articles that it may produce in the future. The notion of an infinite 
population is, then, basic to most statistical enquiries. The popů- 
lation has to be kept in mind, although the data in the hands of the 
enquirer will actually relate to only a small number of members of 
the population, collectively called a sample. 

In advanced treatment of the data thrown up by an enquiry, 
we have ofien to use mathematical representations of the frequency 
distribution of one or more variables in the population concerned. 
Theoretical distributions of the variable or variables are meant to 
_ Serve precisely this purpose. We are going to discuss in this 
chapter theoretical distributions for a single variable. 


10.2 Probability-mass and probability-density functions 
Consider first a discrete variable x. An ordinary frequency 
distribution of x would be described by specifying the frequencies 
260 


a 


UNIVARIATE THEORBTIOAL DISTRIBUTIONS 261 


corresponding to the different possible values of x. Noting, however, 
that it is the relative sizes of the frequencies that are of importance 
in specifying a distribution, an alternative—and in some ways better 
—mode of representing the frequency distribution would be to state 
the relative ‘frequencies corresponding to the different values of x. 
Remembering that the ‘limiting form’ of a relative frequency is the 
corresponding probability, it would then seem natural to represent 
the distribution of x in the infinite population in terms of the 
Probabilities of the different values of x, ie. to represent it as the 
probability distribution of x, We, therefore, look for a function f (x) 
that will give for any specified value of x the corresponding 
Probability ; thus we shall have, for any k, 


S(k)=P[x=h). 
This function is said to be the probability-mass function (p.m.f.) of 
x and has to satisfy the conditions 


Sf (*)>0, for any x whatsoever, ve (10.1) 
and YS (*)=1, s.. (10.2) 


the sum being taken over all the values of x having positive 
probabilities (called the mass-points of x). 

In case x is a continuous variable, we may likewise seek to represent 
its distribution in the infinite population by dividing the whole 
range of x into suitable class intervals and specifying the probability 
for each interval. However, since the mode of classification will in 
any case be arbitrary, such a representation is bound to be only 
partial. To give a complete representation, therefore, we look for a 
function f(x) such that for any interval whatsoever the probability 
will be given by the integral of f(x) over that interval, i.e. by the 
area between the x-axis and the curve of J(=) which is included in 
the interval. Thus, for example, 


b 
[f@armPte< r&i], 
the probability for x to lie in the interval from a to b (a<b). 


Note that the function f (x) itself has to be continuous or at least 
Piece-wise continuous. Further, the probability for x to take any 


262 FUNDAMENTALS OF STATISTICS 


specific value must be zero, so that here P[a < x< b], Pla <x < b], 
P[a < x< bj and P[a < x< b] are all equal to 


b 
f f(x)dx. 


This function f(x) is said to be the probability-density function 
(p.d.f.) of x and satisfies the conditions 


f(x) >0, for any x whatsoever, +421 (90.3) 
and f. ‘f(x)de=1, ve (10.4) 


-0 
where the range of possible values of x is taken, with no loss of 
generality, to be from — œœ to œ. . 

Population distributions have to be iaken into account in 
Statistical inference—i.e, in inferring the nature of a population from 
the nature of a sample. In order to simplify the mathematical 
treatment of inference problems, the function f(x) in both discrete 
and continuous cases is taken to be of a sufficiently simple form. 
The inference problem then concerns the parameters occurring in 
f(x). Distributions defined in this way are called theoretical, because 
they are ideal distributions that are hardly expected to reflect in toto 
the true nature of the population distribution. They are meant 
simply to give a fairly close approximation to the actual distribution 
of the variable in the population, They are like the straight lines, 
circles, rectangles, etc., of geometry, which are abstract, hypothetical 
entities but serve as good models for certain figures that we 
encounter in real life. 


10.3 Characteristics of a theoretical distribution 
It is natural that characteristics like the mean and moments of a 
theoretical distribution should be defined in terms of probabilities 
just as the distribution itself is defined in terms of probabilities. 
Consider first a discrete variable x. For an observed distribution, 
the (arithmetic) mean of x would be 


s= Fn fi= Zulfi). 


UNIVARIATE THEORETIOAL DISTRIBUTIONS 263 
Replacing the relative frequency f,/n by its ‘limit’, i.e. by the proba- 
bility of x; we have as the ‘limiting value’ of z 
p=Ea:Ple—ai, 
which is simply the expectation, E(x), of x. Hence if the p.m.f. of 
the theoretical distribution be f(x), then the mean of x must be 
w=E(x)=J xf(x), ws (10.5) 


the sum being taken over all the mass-points of x. In the same 
way, the rth moment of the distribution about 0 will be 


py = E(x" )=2*'f(*), . (10.6) 
while the rth central moment will be 
r =E(x—p)’ => (*x—p)'" f(x). v (10.7) 


Next, take the case of a continuous variable x. Let the range 
of x (from a to $) be divided into m equal parts by points x»=a, 
=a +h, xy=a+2h, aeee s *m=&+mh=ß. Then noting that for a 


small interval of width A, say for c— h <*<ct 5 


2 
tth 
f(x)dx=hf (c), approximately. 
e-hie 


the sum h Sai f(x) will roughly correspond to the sum XJ x/(x) of 
iz0 z 


the discrete case. The limit of this quantity for increasing m (or 
decreasing h) will be 

B ; 

f xf (x)dx. ves (10.8a) 
This is now called the expectation of x and also serves as the mean 
value p of the theoretical distribution. If the actual range of x is 
(—oo, B), (a, œ) or (—%œ, œ) we consider the limit of (10.8a) for 
a«—» —oo or for B-+ œ or for both a —«,B-—-0oo. Thus, with no 
loss of generality, we have, for a continuous x, 

2 
p=E(x)= Í x f(x)dx. ws (10.8) 


264 FUNDAMENTALS OF STATISTICS 


[In case the actual range of \x is infinite, the mear or expectation 
© 


of x may not exist. It exists if and only if the integral f x f(x)dx is 
-0 

absolutely convergent. Similarly, for the discrete case, if the number 

of mass-points is infinite, the mean exists if and only if the series 

Zaf (x) is absolutely convergent } 

* 


In the same way, the rth moment about 0 (if it exists) is 


p, = E(x") 
= fe pees we (10.9) 
and the rth central moment (if it exists) is 
Br =E(x—p)" 
= ferme saree wes (10.10) 


In determining the median or mode, too, we now make use of the 
probabilities or probability-densities of the different values of x. 
Thus the mode is the value with the highest probability or 
probability-density, according as x is discrete or continuous. In the 
discrete case, this value, say py, is to be found by comparing the 
probabilities for the different mass-points of x. In the continuous 
Case, j being the value at which the p.d.f. f(x) is a maximum, may, 
again, be found by inspection. In particular, if the p.d.f. f(x) has 
derivatives of the first and second orders at the modal value, jig; 
then jg can be determined from the relations 
S'(Ho) =0, J" (uo) <0. s+ (10.11) 
The median may be defined in terms of the cumulative probabi- 
lities for the different values of x. Consider then the function F(x) 


such that 
F(k)=P{e¥ <k] 


f 2 Pi (x) ifx is discrete 


me z ve (10.12) 
J f(x) if is continuous, 


UNIVARIATE THEORETIOAL DISTRIBUTIONS 265 


This function is called the distribution function (or the cumulative 
dis#fibution function of x). For the discrete case, the median p, is. 
the value (mass-point) such that the probability for x to be smaller 
/than or equal to p, is at least 4, while the probability for x to be 
strictly smaller than p, is less than 4. In other words, p, is such 
that 


F(u,—0) <4 <F(u,)- we (10.13) 


In the continuous case, on the other hand, p* is the value of x such 
that the probability for x to be smaller than or equal to p, is exactly 
4. Thus pz, is such that 


F(p.) =} 
or, in other words, 


J fears. ve (10.14) 


All these measures have the same properties as in the case of an 
observed distribution, Thus, for instance, we have, corresponding 
to equation (9.3) for an observed distribution, 


Hy =p, (i)er (gerar een +(=1)" (att. 
ve (10.15) 


Some important theoretical distributions, that are employed even 
in ordinary statistical work, will be discussed in some detail in this 
chapter. 


10.4 The hypergeometric distribution 
This is a distribution of the discrete type and has the p.m.f. 


i (2) (2) 0) Hf 820,12 osm ree 
0 


otherwise, 


where N and m are positive integers (m <N), O<p<! and 
q=1—p. 

Here the constants JV, m and p are such that if any of them takes 
a different value, then a different hypergeometric distribution is 


266 FUNDAMENTALS OF STATISTICS 


obtained. Such constants are called the parameters of this type of 
distribution. 
Note that all of the values 0, 1, ...... » m may not have positive 


probabilities. For in case x > Np, one has (P) =o, and in case 


m—x > Nq, ie- x<m—Ng, one has PaA =0. Hence the mass- 


points of the distribution are those integers x which satisfy the 
inequality m—Ng < x < Np. 


7 z =) since © [NA | Na ; 
Obviously, 2 SF (x)=! since Zi j ) (a) » being the coeffi- 
cient of ¢” in the expansion of (1-+1)¥#(1+t)¥# =(1-+1)¥, equals ait 
Again, (10.16) for x=0, 1, ......, m may be written in the form 


(Nq)m (Nb) (m) e 
wex Crecrsiarik ws (10.17) 


The second factor of (10.17), it may be noted, is the coefficient of 
t* in the hypergeometric 


ye neeeby T appi T ó 
FE Pie) ae at ee aah weeceh 


(10.18) 
if we make the substitutions 


a=—m, B=—Np and y=Nq—m+ 1. 


Thus (10.17) is the coefficient of t* in the expansion of 


Or xF(—m, Nb; Nq—m+15 t) ws (10.19) 
This is the reason why the distribution defined by (10.16) is called 
the hypergeometric distribution, 

If will be clear from Example 3.5 that this will be the appropriate 
theoretical distribution when one is taking samples of size m each at 
random and without replacements from a lot of N ‘ems, of which Wp 
are of one kind (say defectives) and the remaining Nq are of 
another kind (say non-defectives), and the variable of interest is the 
number of items of the first kind included in the sample. 


UNIVARIATE THEORETIOAL DISTRIBUTIONS 267 


The mean of this distribution is 


= = > ! Ni max 
Bas) = E xx gx Oh Klans 


=F ex M aN nae 


2, xl(m—x)! (N)n 


_mNp (m—1)! 
N Zz (e—Nl(m—1—x—1)! 
ee en san 


mS (m—1)! +! Np~¥) aN) m- 1-2", 
Pd, x(n =r W* WA) ge st = 


where x’=x—1 
=mp, eas (10.20) 


since the expression under the summation sign equals 


opti Pe a 


so that the sum is unity. 
Proceeding in the same way, we can show that 
E{(x) -]=(m) (Np) |N); - ve (10.21) 


We can then calculate the raw moments p, and the centrat 
moments p,. It can be verified that 


m=mp A=, < (10,22) 
po=mpa(q—p) Ra) em) ws (10.23) 
and = mb N =m) rv 4+.1)—6m(N—m) 


P= (N=1) (W—2)(N—3) 
+ pa{N?(m—2)—Nm*+ 6m(N—m)}]- 
(10.24) 
As the reader can verify from (10.16), the hypergeometric 
distribution is symmetrical if p=}, is positively skew if p<4 and is 
negatively skew if p>}. This will also be apparent from (10.23). 


268 FUNDAMENTALS OF STATISTIOS 


10.5 The binomial distribution 
This is, again, a discrete distribution and has the p-m.f. 


mM] prgn- if x=0, 1, 2) oiia : 
fe)= A ea J + (10.25) 
0 otherwise, 


where m is a positive integer, 0< p< 1 and q=! =$. 
This distribution has two parameters, viz. m and p- 
Clearly, J(x)>0, for all x. 


Further, Zr) Stel L. 


Consider a set of trials each of which can result in an event £. 
(By ‘trial’ we mean an attempt to produce a particular event, 
which is neither certain nor impossible.) The occurrence of E will 
be referred to as a ‘success’ and its non-occurrence as a ‘failure’. 
Suppose that the trials are independent (that is, suppose the 
probability of a success—as well as of a failure—in any one trial is 
not affected by the outcomes of any other trials) and that the 
probability of success is the same, say b, for each trial, the proba- 
bility of failure being g(=1—p). Such a set of trials is called a set. 
of Bernoullian trials. Thus m tosses of a coin may be looked upon as 
a set of Bernoullian trials (where by ‘success’ we may mean the 
occurrence of a head), with p=4 if the coin is unbiassed. Again, 
consider an urn containing two types of objects, Np objects of the 
first typ? and Ng of the second. If m objects are drawn, one by one 
and with replacements, the objects in the urn being thoroughly mixed 
before cach drawing, then one will again get a set of Bernoullian 
trials. The appearance in a drawing of an object of the first type 
may now be called a success, its probability being Np|N=p. 

Let us determine the probability of getting a total of x successes 
in a set of m Bernoullian trials. The probability of having successes 
in x trials, and hence failures in the remaining m—x trials, in any 
preassigned order is 

a er 
since the trials are independent and for each trial the probability of 
success is p and the probability of failure is q. Now, x successes 


UNIVARIATE THEORETICAL DISTRIBUTIONS 269 


and m—x failures may occur in (e) orders, (?) being the number of 


ways in which the x places to be occupied by successes can be 
chosen from the total number of m places in a sequence. The 
required probability is, therefore, 


per 


if x=0, holia , m (and is zero for other values of x). 


Hence the binomial distribution will be the appropriate theoreti- 
cal distribution when we consider sets of Bernoullian trials and 
the variable of interest is the number of successes per set of trials. 

Note that the binomial distribution may also serve as an 
approximation to the hypergeometric (10.16) provided m is very 
small compared to N, i.e, provided m/WN is negligibly small, For, in 
this case, the hypergeometric probability s 


OPa) bn 


=(") (Np) x (Ng) m-x N Yi 


_f" ya a i aa a al 
: TEE z SS (1-27 


becomes approximately equal to the binomial probability 
M)\ px pm—* 
(p)ar 


In a sampling situation, this will mean that we are sampling from a 
very large lot (having items of two kinds), so that sampling with 
replacements and sampling without replacements become practically 
equivalent. 

An idea about the nature of the approximation may be had 
from the following table, where we compare the hypergeometric 
probabilities for (1) N=50, m=10, p=0-08, (2) N=100, m=10, 
p=0-08 and (3) N=1,000, m=10, p=0-08 with the binomial 


probabilities for m=10 and p= =0-08. 


270 FUNDAMENTALS OF STATISTIOS 


TABLE 10.1 


COMPARISON of THe HYPERGEOMETRIO DISTRIBUTIONS WITH 
m=10, p=0:08 anp (1) N=50, (2) M=100, (3) W=1,000 
AND TAB Brnomtau DISTRIBUTION WITH SAME m AND p 


Hypergeometric 
Velne of | probability probebiiny 
(1) (2) (3) 
0 0°3968 04166 0:4327 0:4344 
E 0:4290 0:4015 0-3800 0:3777 
2 0:1524 0:1506 0°1481 0°1478 
3 00208 00283 0:0337 0:0343 
4 0:0009 0:0029 0:0050 0:0052 
5 0 0°0002 0:0005 0:0005 
6 0 0:0000 0.0000 00000 


10.6 Moments of the binomial distribution 
To start with, let us determine the moments about zero of the 
distribution defined by (10.25). It is seen that 


El(x) A= È- EET at 


= $ e-l) vnashy (t-r +I) ach 4 “yale 


Pos —x) 


! ane 
=F, TEN m— N al 
(m—r)! 


=m(m—1)...... (m—r+1)p" > ier ary a a 
(putting x'=x—r) 
=m(m—1)...++. (m-r+1)p"(q+p)"~" 
=(m), p". s+ (10.26) 
Hence the first moment about zero is 
p, =E(x) =mp. 
The second moment about zero is 
py = E(x?) 
= E[(x)s] + E(x) [since x =r(3— 1) +x) 
= (m): p*+mp. 


UNIVARIATE THEORETICAL DISTRIBUTIONS 271 


In the same way, remembering that 
x8==x(x—1)(x—2) 4+ 3x(x—1) 4+-x=(x)9+3(x)g +x 
and = xt=x(x—1)(x—2)(x—3) +6x(x—1) (x—2) +-7x(x— i) +x 
= (x) + 6l) +7 (x) +x 
and using eqn. (10.26), we have 
#'3=(m)s p+ 3(m), p*-+-mp 


and #'g=(m), p*+-6(m)s p?+-7(m)s p?-+-mp. 
Thus the mean of the distribution is 
p=mp. es (10.27) 
By virtue of equation (10.15), we also have 
He= Hey? 
={(m)_ p?+mp}—m*p* 
=mp(1—p)=mpq, s+ (10.28) 
so that the standard deviation of the distribution is 
o=V mpg. we (10,29) 
Similarly, 


Bs=H's— Spak i +2y',? 
=mp(1—p)(1—2p)=mpq(q—P); ... (10.30) 


and Hah o— 4p ge + 8p a's? — 3p 
=3m'p*(1—p)?-+mp(1—p){1—6p(1—p)} 
=3mp?g?-+ mpq(1—6pq). ade 5(10.31) 


The moments can perhaps be more easily obtained from the 
moment-generaling function. By definition, the m.g.f. of a random vari- 
able x about a is the function of the real non-random variable t 

M, (t)=Efe"-*], ee (10.32) 
provided the expectation exists for every t in some interval containing 
the value t=0. We thus have 


M(N) =E|1 +a) tga) ot. ti (x=)! fesse. } 
1418 (—0)+ HE a) IAR +4 E(x—a)" + SS 


272 FUNDAMENTALS OF STATISTICS 


Thus the rth moment about a is obtained as the-coefficient of ¢” Jr! in 
the expansion of M,(i). Naturally, the central moments will be 
provided by the m.g.f. about the mean (u), i.e. by 
My, (t) =Efe't*-*). sve" (10.33) 
For the binomial distribution (10.25), 


M,(t) =3e (o) prg- > 


=(g+ pe)" 
={1+ (—1)p}" vs (10.34) 
_ From the formula for moments, we have, for this distribution, 
v= VBi=(9—p)|V mpg se (10,35) 
and Ye=Be—3==(1—6 pg) /mpq. ws (10.36) 


As (10.35) indicates, the distribution is symmetrical, Positively 
skew or negatively skew according as p=}, p < $orp>4. This can 
also be seen directly from (10.25). Further, (10.36) shows that the 
distribution is mesokurtic if pg=}, is platykurtic if b4>+4 and is 
leptokurtic if pg < 4. ` 


10.7 A recursion relation for moments of the binomial 
distribution 
The derivation of moments of the binomial distribution is facili- 
tated by the use of a recursion relation which we are going to 
establish. 
From (10.27), we have p=mp. Also, by definition, 


Hr =E(x—p)" 
=È- mp) r (e gn". 
Differentiating this with respect to p, we get 
te = -m È (x—mp) wrk (o) ptgn-* 
timp )' (e) *? grt È (smp) (mma) prgrs 


== rma 1+ È (xmp) " (7) p-tge-*-1(x9— me, 


uv 


UNIVARIATE THEORETIOAL DISTRIBUTIONS 273 


so that 
bg rms ate) =Žt-mp t (7) fe 


or Mr =P9 rps + Ge) i vs (10:37) 


Knowing that m=1, p,=0 and using equation (10.37), one can 
quite easily evaluate the higher-order moments of the binomial 
distribution, 


10.8 Fitting a binomial distribution to an observed distribu- 
tion 

When we fit a theoretical distribution to a given observed distri» 
bution, our purpose is to examine whether the observed distribution 
may be regarded as the distribution of the variable in a random 
sample from the population characterised by the theoretical 
distribution. For this it is nece ssary to calculate, corresponding to the 
observed class-frequencies, the frequencies that are to be expected in 
case the given observed distribution is really for a random sample 
from the assumed population, (Close agreement between the two 
series of frequencies is taken to mean that the fit has been good). The 


` first step in fitting a theoretical distribution is, therefore, to estimate 


the parameters of the theoretical distribution from the observed one, 
unless the values of the parameters are assumed or are known 
a priori. 

A particularly simple method of estimating parameters is the 
method of moments. Supposing there are k parameters to be estimated, 
the method consists of the following steps : 

(a) To express the moments of the theoretical distribution 
in terms of the parameters. 

(b) To equate the first k moments of the theoretical distribution, 
expressed in terms of the parameters, to the corresponding moments 
of the observed distribution, 

(c) Finally, to solve the & resulting equations in order to 
determine the & parameters. 

The first k moments are taken because the error in a moment 
due to sampling increases with the order of the moment. 


¥a(1)—18 


274 FUNDAMENTALS OF STATISTIOS 


For a binomial distribution the first moment about zero is 
p=mþ, 
while the first moment about zero of the observed distribution is x. 
Equating the two, we get 


mp=%, 
so that the estimate of p is- 
f=3/m. ws (10.38) 
The estimated expected frequencies will, therefore, be : 
nxfix)=nx (7) pr — pyr", wee (10.89) 


for x=0, 1, 2, ...... , m, 
where n is the total frequency, i.e. the total number of sets of m 
trials each. 


Example 10.1 Twelve dice were thrown 2,630 times and each 
time the number of dice which had 5 or 6 on the uppermost face 
was recorded. The results are shown in the following table : 


TEREEPE 


na| s19| 307| 12| s0 n | 2 | 0 | 0 


Number of dice with ol1i|2i3 
5 or 6 uppermost 


Frequency | 1 (n15 [s26 s4 


Graduate the observed distribution (a) with a binomial distribu- 
tion for which p is unknown and (b) with a binomial distribution for 
which p=}. 

Case 1: Here to fit a binomial distribution, p has to be 
estimated from the observed distribution. The mean of the latter 
distribution is x 


—* ph) =4: : 
ee 2630 405399 ; 


so the estimate of p is 
__4:05399 

b= 12 
The probabilities f(x) are calculated by using the relation 


flx)= (=A xf(x—1), 


=0°33783. 


UNIVARIATE THEORETICAL DISTRIBUTIONS 275 


for x=1, 2, ...... sm. Here 
S=4" 
or log f(0)=m log 9=12 x log 0-66217 
=3-8516340—log 0-0071061, 
so that J(0)=0-0071061. 
Also, $/j=0°51019. 
The subsequent calculations are shown in the following table : 
TABLE 10.2 


Frrtixc a BINOMIAL DISTRIBUTION TO 1HE Frequency 
DISTRIBUTION oF NUMBER OF Dios Sgowina 5 oR 
6 iN 2,630 Tarows or 12 Drom (p Estimatep rrom Dara) 


po | mt feoai POZA | Expected | Observed 
| | =nxcol.(4)| 
(6) Bel raa. (3) (4) x | 
0 ee +, 0-0071061 i869 | 18 
1 12 6 12228 | 00435055 1442 | us 
2 55 280604 | 01220782 321-07 326 
3 3-33333 | 1-70063 | — 02076098 546-01 548 
4 2-25 114793 02383215 626-79 ell 
5 16 081630 | 01945418 51164 519 
6 116667 | 059522 0°1157952 304-54 307 
7 0-85714 | 043730 | — 0-0506372 133-18 133 
8 0:625 0:31887 0-0161467 42:47 40 
‘9 044444 | 022675 | 00036613 963) Il 
10 03 015306 00005604 147 2 
11,12 x = 0-0000363* 0-09 0 
Total | ans | oe | 1-0000000 | 2,63000 | 2,630 


*Obtained from the identity: /(! I)-+y(12)= 1-3 fe). 


A comparison of the last two columns of the table indicates that ' 
the fit has been quite satisfactory. 


276 FUNDAMENTALS OF STATISTICS 


Case 2: Here the procedure is the same as in Case 1, but for p 
we now use its given value, 1/3. So 


f(0)= 9" =(2/3)", 

or log f (0) =12(log 2—iog 3) 
: =3-8869044 =log 0-0077073, 
giving f (0) =0-0077073. 
We also note that 
pla=3- 
The expected frequencies may then be calculated as in Case 1. 
TABLE 10.3 


Firmina A Brxomiat DISTRIBUTION TO THE FREQUENOY 
DistreuTion or Numper or Dion SHOWING 5 OR 
6 is 2,630 Turows or 12 Dios (p=3) 


f ci a Expected | Observed 


r] x col. (2)xp/g Xcol. (3) frequency | frequency 
=nXcol.(4) 
(4) (5) (6) 


0 — a= 00077073 20:27 18 
1 12 6°(0000 0°0462438 12162 115 
2 55 2°75000 0°1271704 334°46 326 
3 $°33333 |. 1 66667 0°2119511 557°43 548 
4 2'25 1+12500 02384450 627-11 611 
5 1-6 080000 0:1907560 501-69 519 
6 1.16667 0-58333 07112737 292 65 307 
7 085714 0°42857 0°0476886 125-42 133 
8 0°625 031250 0°0149027 39:19 40 
9 044444 0:22222 00033117 871 ii 
10 03 0:15000 00004968 1'31 
11, 12 => - 0-0000529* 0'14 


| 10000000 | 2,63000 2,630 


*Obtained from the identity: f(! D+). 


UNIVARIATE THEORETIOAL DISTRIBUTIONS PAKA 


Here also a comparison of cols. (5) and (6) of the table shows 
that the fit has been fairly satisfactory, although it is less good 
than in Case 1. 

10.9 The Poisson distribution ' 
This, again, is a discrete distribution. It has the p.m.f. 


exp(—A)A*/x!_ if x=0, 1, 2, o.. 
f(x)= { a eee 


0 otherwise, 
where the parameter A is a positive quantity. 
It is readily seen that S(x) 20 for all x. Further, 


© 
ÈS) =Š exp(—NA"/xl=exp (—A) ÈA" a 
a=0 xap *=0 
=exp(—A) exp(a)=1. 
The distribution may be looked upon as a limiting form of the 
binomial distribution. Thus suppose for a binomial distribution, 


m-> co and p-»0, mp remaining the same (=A, say). In that case, 
for any pe x; 


(10.40) 


aR rath oie 


= 7 lim (alll mg (1-2 parafi- 


a 
lim (1 — 3)" 
\ =exp(—A)A*/s1, 
since a 
Jim {1 (1-3) (1-2) uae (- we fe) 
lim: 1-2)" =exp(—A) 
and lim {1—2) "001. 


mn mi 
Thus we may say that as m-+co and p-+0, mp remaining the 
same (=A), the binomial distribution defined by (10,25) tends to 
the Poisson distribution defined by (10.40). 


278 FUNDAMENTALS OF STATISTIOS 


The practical implication of this result is that in case m is very 
large and p is very small, but mp is of moderate magnitude, the 


binomial probabilities (e) p*q"-* may be well approximated by the 


much simpler Poisson probabilities exp(—A)A*/x!, where A=mp. 

Table 10.4 should give a good idea about the nature of the 
approximation, As is to be expected, the quality of the approximation 
improves, for the same yalue of A=mp, with increasing m and 
decreasing p. 

TABLE 10.4 
Comparison o THe BiNomraL DISTRIBUTIONS WITH 
(1) m=100, p=0:05 anD (2) m=500, p=0-01, 
AND THE Poisson DISTRIBUTION WITH A=5 


Probahility 
Value of x Binomial 
Poisson 
u) (2) 

0 0:0059 0:0066 0:0067 
1 0:0312 0:0332 0:0337 
2 0-0812 0:0836 0:0842 
3 0°1596 0:1402 0:1404 
4 0:1781 0-1760 0:1755 
5 0:1800 ` 0:1764 01755 
6 01500 0°1469 0:1462 
7 0:1060 01048 01044 
8 0:0649 0:0652 00653 
9 00349 0:0360 0-0363 
10 0:0167 00179 0-0181 
11 0:0072 0:0080 0:0082 
12 0:0028 00033 00034 
13 0-0010 00012 0-0013 
14 0:0003 0-0005 00005 
15 0-0001 00002 00002 


16 0-0000 0-0000 ©0000 


UNIVARIATE THEORETICAL DISTRIBUTIONS 279 


However, there is an alternative and more general way of looking 
at the Poisson distribution. 

Consider a small interval, say of time t (or a small part of one- 
dimensional, two-dimensional or three-dimensional space, with 
length, area or volume t). Let E be an event that may occur within 
such an interval such that the probability is approximately yi that Æ 
will occur once in the interval and the probability is approximately 
zero that it will occur more than once in the interval. Further, 
suppose that any number of such intervals are statistically indepen- 
dent in so far as the occurrence of £ is concerned. 

Now, take a given interval of length 7 that contains m of the 
small intervals, m being sufficiently large. Then the probability 
that the event will occur x times in that interval is, approximately, 


(1—3) (1-2) aa (= 


x 


x (amit (1—7, 


(7) entau" 


Since m is large, this is again approximately equal to 
exp(—pT') (uT')* /x!. vs (10.41) 
Thus x may be taken to have the Poisson distribution with A=pZ’. 
(p is said to be the rate of occurrence of the event per unit time.) 

This is how we may account for the fact that the distribution of 
the number of telephone calls made through an exchange per, 
say, one-minute interval or the number of cars passing through a 
street-crossing per five-minute interval during the busy (or the slack) 
hours of the day, the distribution of the number of defects per 
one-metre piece of a fibre, the number of misprints per page of one 
of the early proofs of a book, etc., are found to closely follow the 
Poisson form. A variety of such examples is given in [2]. 


10.10 Moments of the Poisson distribution 
For the Poisson distribution, moments of all orders exist. We 
have 


El(x),}=Zx(e—1)(x—2)..-...(2—r +1) f (x) 


280 FUNDAMENTALS OF STATISTICS 
=F x 41) (2-2) enol $1) SPY 
=exp(—) ŽA" /(x—r)! 


=à" exp(—)) 2" «l=? exp(—A) exp(d) 
=A" a. (10.42) 
Hence the first four moments about zero are 
wy =E[ (x), =A, 
B's =E[(x) 2) +-2[(x)3) 
=\+A, 
#'s=E[(*)3]+32[(x)2]+-2[(*)1] 
=A943)2-4A 
and B'4=E[(*)4] + 6E[(*)3] +72 [(*)2)+-EL(s)1] 
=\M+6N8-4722-LA, 
Thus the mean of the distribution is 


p=. es (10.43) 
We also have 


Ba=p'a p : 
=(A?-+))—A?=A, . (10.44) 
so that the standard deviation of the distribution is 
o=vV), «+ (10.44a) 
Similarly, 
MH’ g—3p' ope’ +2p',® 
= (A8-++3A9-+-A) — 3 (A34 AJA +21? 
=À s.. (10.45) 
and Beh am 40g p's + p'ap — Spt 
= (At HGA TAIHA) —4(A9-+-3A9-++)A4-6 (A+ A)A2— 3A 
=3A?+A. + (10.46) 


Thus it is seen that for the Poisson distribution with parameter A, 
the mean, the variance and the third central moment are all 
equal to À. 


UNIVARIATE THEOBETIOAL DISTRISUTIONS 281 


The moment-generating function of the distribution is defined 
for all t and is 


M,(t)=E(e'*) 


=f" Ker" fa! 


s=0 


oF (ae) 


=exp[A(e—1)]. ws (10.47) 

For this distribution, 
p=V=l/VĂ ws. (10.48a) 
and ya=ba—3 =l ... (10.48b) 


Thus the distribution is seen to be positively skew and leptokurtic. 


10.11 A recursion relation for moments of the Poisson 
distribution 
Like the moments of the binomial distribution, the moments of 
the Poisson distribution are linked by a recursion relation, 
For the Poisson distribution, we have 


2 Ar 
=È (1A) pl- 
220 * 
Differentiating both sides with respect to A, we get 
dt, Sisar les vs ae ‘xo{— àz 
dir= rÈ (1A) erp- E A exp(—A) 
a wart 
+È ea 


=- aE ey SPIE ea) 


or Afrar- +4) =ġ(r- yate A)A 
bi Brai=A merate A ws. (10.49) 


Putting p=] and p,=0 in (10.49) and taking r=1, 2, 3, etc., 
successively, one can obtain the central moments of higher orders, 


282 FUNDAMENTALS OF STATISTIOS 


10.12 Fitting a Poisson distribution to an observed distri- 
bution 
The Poisson distribution has only one parameter, viz. A, which 
can be estimated from the obse ved data by the method of moments. 
The mean of the Poisson distribution is A, while the mean of the 
observed distribution is x. The method oi moments, therefore, 
requires that we take as our estimate 


A=%. 
The expected frequencies corresponding to the observed fre- 
quencies will then be obtained as 


nx f(a)—n x ERAN, forx=0,1,2,etc. ... (10.50) 


Example 10.3 The following table gives the frequency distribu- 
tion of number of weed seeds per packet for 196 one-lb. packets of a 
variety of pulses : 


Number of weed 
paik 0 1 2 3 4 5 6 7 8 9 


Frequency | 7 $3 87" Se 1€ 8 5 1 1 


In order to fit a Poisson distribution to these data, we first have, 
as an estimate Â of the parameter A, the observed mean 


=x faln 
* 
= 568/1 96= 2:898. 
The expected frequencies are then obtained from the formula 
nx f(x) =196 x exp[- sl (2:898) * | 


We note that 
log {=0-4620983 
and log exp(—A)=—Aloge 
= —2:898 x 04342945 
= — 12585855. 


‘The subsequent calculations are shown in Table 10.5. 


UNIVARIATE THEORETICAL DISTRIBUTIONS 283 


(As the reader may well realise, one can obtain the value of /(0) 
as antilog gene =0-055133 and then calculate 


sys xf(0), s= xfU), f3)=3 xf (2), 


and soon. The defect of this alternative method is that here the 
errors of approximation accumulate and so probabilities for values 
towards the end of the table may be rendered highly unreliable. ) 


TABLE 10.3 


Fırtına A Poisson DISTRIBUTION To THE FREQUENCY 
DISTRIBUTION or NuMBER oF WEED SEEDS 
PER Packet oF POLSES 


x log A F(x) Expected | Observed 


ween ess xl } frequency | frequenc: 
bae racket —Aloge | =antilog(3) /(2) | “SRG |” 
Bas a ta Ws oe ps ree eed ha de A 
0 1 | 27414145 0055133 10-79 7 
1 1 | 12035128 0-15978 31:32 33 
2 1-6656111 0-23152 45°38 54 
3 6 | 01277094 022364 43-83 37 
4 24 | 05898077 0°16203 31°76 34 
5 120 | 1-0519060 0093913 18°41 16 
Gr: 720 | 1:5140043 0:045360 8:89 8 
7 5040 | 19761026 0:018779 368 5 
8 40320 | 2-4382009 0°006803 1:33 1 
9 362880 | 29002992 0-002190 043 
>10 - = = | 018 0 
Total | = — | ~ | 196-00 | 196 


Comparing the last two columns of this table, we may say that 
the fit has been only fairly good. 


284 FUNDAMENTALS OF STATISTIOS 


10.13 The negative binomial distribution 
This is another distribution of the discrete type. It has the 
probability-mass function 


f ary e h if 0, 11) 2; osis 


(x)= } re 
0 otherwise, ... (10.51) 
where r is a pos'tive integer, 0< p< l and g=1—p, 
In cage r=1, i.e. in case 
7a ae 


af 9° 
j f(s) { 0 otherwise, 
it is said to be a geometric distribution. 

Note that f(x) >0 for all.x, 


and “Efe 2 ee 


2 [r+x—1 
EA 2, ( A x ) q* 
=p'(l—g)-" =i, 
so that f(x) is indeed a p.m.f. Also, f(x) for any non-negative 
integer x is seen to be the (x-+1)st term in the expansion of the 
binomial p"(!—g)~' with a negative index. Hence the name ‘nega- 
tive binomial’. 

Consider an indefinite series of Bernoullian trials, Suppose p is 
the probability of the occurrence of an event E (called a ‘success’) 
in a trial so that g=! —p is the probability of the non-occurrence of 
E (called a ‘failure’). Let the trials be repeated until the event E 
occurs r times. The probability that x failures will precede the rth 


success is 
probability that Æ occurs r—1 times (and fails to 


occur x times) in the first r+-x—] trials x proba- 
bility that Æ occurs in the (r-+x)th trial 
Sees p'-14" xp 
r-l 
sS ee, b'q*, for x=0, 1, Daai 
r=l 
Thus the negative binomial distribution arises as the distribution 
of the number of failures preceding a specified number of successes 
in an indefinite series of Bernoullian trials. Exercise 10.7 presents an 
alternative and more interesting model leading to this distribution. 


ha 


UNIVARIATE THEORETICAL DISTRIBUTIONS 285 


We have, for the distribution given by (10.51), 


Eil 3 (a GES oat 


xlr— 


i r= (r+x—1)! q* 
b 2, (x) x lat 
Y, arat (r+x—1)! x-k 
TER 3, CEET 


m(rtk- Dapa S rite e 


Zo x (rt+k—1)! 
=(r+k—1)np"q bla gyetre 
=rir+l)......(r+k—Lg*/p*. sa {90:52} 

Hence the mean is 
p=E[ (x) ]=rg]b, vs (10,53) 


and the variance is 4,=£[(x),] +£[(x),]—{E[(«),]} 
=r(r+1)q*/p?-+rq/p— ralo 


=rq(q+p)/p°=r9/ p. se (10.54) 

Similarly, one can show that 
Ps=rq(l+q /p° vee (10.55) 
and p= (rall +4 +g) +3729") p4, ~ (10,56) 
so that y= (1+4) Vra «+ (10.56a) 
and .« ye= (14+-49+9")/rq. ++ (10,56b) 


Thus the distribution is necessarily positively skew and leptokurtic. 


10.14 Rectangular (or uniform) distributicn 
The theoretical distributions that we have considered so far are 
all meant for discrete variables. The rest of this chapter will be 
devoted to theoretical distributions of the continuous type. The 
simplest distribution of this group is the rectangular distribution, 
which has got equal probability-densities for all values throughout 
the range of the continuous variable x. The probability-density 

function, f (x), is defined by 
Wf(b-a) if a<x <b 


foe p o a aie (10.57) 
Clea: ly, f(*)>0 for all x, 
and [ omen f E- . 


286 FUNDAMENTALS OF STATISTICS 


The mean of the tee is 


wm firon- jit =x asat na, vs (10.58) 


The Si moment aboni eci start of the curve, a, is given Aa 


= —a)’ = (x= È d: (b— ib 
Ri fo seem f A E A LEN 


Ziza" /(r+1). ses (10.59) 
So the variance, o°, is given by 


opp, p? 
= (b—a)*/3—(b—a)*/4 = oe ws (10.60) 
Similarly, we have 
pa=0 and p¿=(b—a)t]80. «» (10.61) 
Thus y=0 and y=— l-2. ++» (10.61a) 


Obviously, the distribution is symmetrical and highly platykurtic. 


10.15 Normal distribution 

Of all theoretical distributions for continuous variables, the most 
important is the so-called normal or Gaussian distribution. 

The distribution is defined by the probability-density function 


J= eapi a), —o<x<m, ... (1062) 


where it is assumed that 4>0, 
Clearly, f(x) is positive for all values of x.. Further, using the 
improper integral 


Jori- 

we have ; i 
ai Qh T (since the distributi 

toes She -hix a)" ution 

fo z = Raa ig is symmetrical about a) 

“ial pen z*]dz [putting h(x—a}=z] 


=l. 


UNIVARIATE THEORETICAL DISTRIBUTIONS 287 


10.16 Properties of the normal distribution 
The more important properties of the distribution are as follows : 
(a) From (9.62) it is seen that the distribution is symmetrical 
about the point x==a, since 


Sla+u) =f(a—n)= i exp[—Mu*), 
‘ 7 
whatever u may be. 
(b) Since the distribution is symmetrical about a, its mean and 


median coincide, both being equal to a. Its mode is also equal to 
a, as can be seen from the fact that 
f'(a)=0 and f"(a)<0. 

This is also evident from the fact that exp[—A*u*] decreases monu- 
tonically as u? increases from zero, i.e. as u deviates from zero in 
either direction. 

(c) Again, because the distribution is symmetrical, its odd-order 
central moments are identically equal to zero, As regards the central 
moments of even orders, we have 


b= 5 J (x—a)""exp[—}(x—a)*]dx 


Lah frs Gaie o (since the integrand is 
V 3j AE OES symmetrical about a) 


_ Qh > [putting -A —a)’, 
=z (i) ole) T a. so that de= l. Ka 
0 2hy 
“yg mata 
= =) 
a aia a i ras UE Bae (since I'(}) =z). 
(10.63) 
In particular, 

ooo ... (10.63a) 


Hence (10.63) may also be written in terms of ø as 
Mar =(2r—1)(2r—3)...... x3x1x o. + (10.63b) 


288 FUNDAMENTALS OF STATISTIOS 


The moment-generating function about » of the normal distri- 
bution is given by 


Ee OY 


= f exp[t(x—»)]}—— vr exp[— SE 


=exp[fto*/2) af Igelen) —2(s—n)o*+ fo} /2o dx 


=exp[t*o?/2] JE aor {x—p—to*}*/20*}dx 


=exp[ito?/ 2], 
so that Her y= 0 
and par = OD ot" = (2r—1)(2r—3) gai x3x1 x0, 
as before. 
= For the normal distribution, we have 
pa=0 and p,=3e4, +» (10,64) 
giving yı=0 and y,=0. .. (10.65) 
Since a=p, the mean of the distribution, and k= oP the pro- 
g 
bability-density function KN be written in its more usual form : 
S= ggl (x—H)*/208], s+ (10.66) 


where —o<x< 00. 


(d) The curve has two points of inflection at a distance o on 
either side of p, since 


Te = — FP exp —(4—n)¥/2ot) 


d? 
atid ere). -Eza exp[— (x—p)?/20?]—-— sgl- (x—p)?/20?] 


=i- oep- (-u)/2e], n (10.67) 


which, when equated to zero, gives x=u Fo. 


UNIVARIATE THSORBTIOAL DISTRIBUTIONS 289 


Thus the curve of the normal distribution is convex upwards 
within the interval (1—o, +e), and outside this interval the curve 
is concave upwards. Fig. 10.1 shows a number of normal curves with 
the same mean but with different standard deviations. 


PROBABILITY —OENSITY 


Mr!2o rts uO yrs KB t3 fat, uts #20 
i VARIABLE 


Fig. 10.1 Normal curves with the same mean but with different 
standard deviations. 


(e) Let r=*—#, where x is a normal variable with mean pand 
a 


standard deviation o. Then obviously + has mean zero and standard 
deviation unity. Further, it can be shown that 7 is itself a normal 
variable (vide Section 15.6). Such a normal variable is called a 
standard normal variable or a normal deviate (withunit standard deviation). 
It has the probability-density function 


$= epl]. ve (10.68) 


The values of $(k) and 
k 
O(k)= f H(e)de, srg. vaaan a (AAGO) 


* whnh is the cumulative probability P[r< k], are given in statistical 
tables for different values ofk. In the present book they appear in 

Table | of Appendix B, 
f l The ordinates and the cumulative probabilities P[x <q], for the 


ee ¥a(1)~19 


290 FUNDAMENTALS OF STATISTIOS 


normal variable x with mean # and standard deviation o, are 
obtained from the tabulated values using the following relations : 


aly g/e—e 
f(a) =, x$(— ve (10.68a) 
and Plx <q]=0(*—#). ... (10.69a) 
g 
Further, since the distribution of 7 is symmetrical about 0, 
$(—k) =9(k) ... (10.68b) 
and D(—k)=1— (k). ... (10.69b) 


Hence the values of $(t) and ®(k) are tabulated only for non- 
negative values of k. 

(f) The probability for a normal variable x (with mean p and 
standard deviation o) to lie in any specified interval, say in the ` 
interval from a to b(a<b), can be obtained from the tabulated 
values of ®(r). Thus 


Pla<x<b)=Plx <b)—Pix< =o) © (<4). -» (10.69¢) 


Such probabilities are shown in the following table for some 
typical parts of the range : 


| Approximate probability 


Below p—20 0-02 
Between p—2e and p—¢ 0-14 
Between p—o and p 0-34 
Between p andu+o 0-34 
Between +o and p42 O14 
Above p+20 0°02 


(g) Another point to be noted is that although a normal 
variable can theoretically take any value between — œ and ©, for ,/ 
all practical purposes it may be assumed to lie between p— 30 and + 
u+30, the probability of its lying beyond these limits being(104 very 
small, 0:0027 approximately, The interval (u—3o, p+30) ir / often 
referred to as the efective ran ge of the normal variable. 


UNIVARIATE THEORETICAL DISTRIBUTIONS 291 


10.17 Limiting forms of binomial and Poisson distributions 
We have seen that a binomial distribution with parameters m 
and p has : 


When m is very large, and neither p nor q is very small, 
yıœ0 and y,œ0. 

These indicate—and this has, in fact, been rigorously proved— 
that under the above conditions, the binomial distribution can be 
approximated by a normal distribution. The normal distribution will 
have the same mean, mp, and the same standard deviation, V mpg 
as the binomial distribution. 

Again, a Poisson distribution with parameter A has 

yı=l/VÀ and y,=1/A, 
both of which become negligibly small when À is a very large number. 
This suggests that a Poisson distribution can be approximated by a 
normal distribution, provided À is sufficiently large: (This result too 
can be rigorou:ly proved.) The approximating normal distribution 
has the same mean, A, and the same standard deviation, V/A, as the 
Poisson distribution. 

One point is to be noted in this connection. The binomial and 
Poisson distributions are discrete distributions, whereas the normal dis- 
tribution is continuous. The probability that x assumes the value ris 


r 
a (Pjer or Gi exp(—Al4 > 
according as x is a binomial or a Poisson variable. To approximate 
the above probability by means of a normal distribution, we have 
to integrate the appropriate normal density function from r—1/2 to 
r+1/2. This has to be done since we are replacing, while making 
the approximation, a discrete variable by a continuous variable. 
Hence the approximate values for the above probabilities will be : 


r+i/2 
k EIA T 
(i) Tieng! (#—mp)*/2mpq\dx 
l rele 
cs eee 
and (ii) sax [ol (x—A)*/2A] dx. 


r-3/2 


292 FUNDAMENTALS OF STATISTIOS 


Similarly, if we have to find the probability P[a < x < b], a and b 
being two positive integers, and x being a binomial or a Poisson 
variable, then we have to integrate the corresponding normal density 
function from a—1/2 to b+1/2. However, it will almost be the same 
as the integral from a to b provided 1/2 is very small compared to 
VV mpq or VÀ (as the case may be). 


10.18 Fitting a normal distribution 
As in other cases, the first step in fitting a normal distribution to 

observed data consists in estimating the parameters p and o by the 
method of moments. Since p and ø are the mean and standard 
deviation of the theoretical distribution, the method of moments 
gives as their estimates z and s, the mean and standard deviation 
of the observed distribution. With these estimates, we can then 
calculate the expected frequencies by using the tables of the normal 
deviate. Consider, for instance, the expected frequency for the 
interval from x=a to x=. This expected frequency is 

b (b-*Is 

nf l expl—(x—3)"/2s4]dx—n [ome 
SV er 


d (a~z)/s 


[where rat | 
s 
($-a) (a-g) 


=| f $(r)dr— f $()dr] 


b— j =n 

A ih ris N Us 
The values of g(r) can be obtained from the tables of the normal 
deviate, 

If one wants to draw the fitted curve over the histogram of the 

observed distribution, it will be necessary to compute the ordinates 

nx l _exp[— (x—8)?/2s] 

sV 20 

for some appropriate values of x. Usually one takes the ordinates at 
the class-boundaries of the observed distribution. Multiplication by 
n is essential, for otherwise the ordinates will not be comparable to 
the frequency-densities of the observed distribution. 


UNIVARIATE THEORETIOAL DISTRIBUTIONS 293 


Now, 


nx exp[—(x—2)*/2s*] =" $(7), 


1 
sV 20 
where r=(x—z)/s.. The values ¢(r) can be obtained from the tables 
of the normal deviate. 

Example 10.4 Fit a normal distribution to the frequency distribu- 
tion of height of Indian adult males given in Table 6.10. Also draw 
the fitted curve over the histogram of the observed distribution. 

For the distribution of height of Indian adult males, the mean 
and standard deviation were found to be 

z=164:734 cm. and s=5°472 cm. 


Here n=177 and n/s=32-346. 


TABLE 10.6 
Frrtunc A Norman DISTRIBUTION TO THE Hoereut-DistRBUTION 
or Inpran Apur Mares (Taste 6.10) 


p Oa aaa 


Height Ordinat Expected | Obser- 
ee emt} e) ango) 20) | 407) | requency [ved fre- 
x s ndo quency 
o}]@ | ® |] @ (5) a | @ 
ey lS. 0 0 


0-0001126* 0:020 0 
0:0026478 0'469 
0:0286123 5:064 3 
0:1404492 24'860 24 
03146168 55'687 58 


14455 | —3°689 |0:0004424| 0:0143 |00001126 
14955 | —2:775 |0:0084874| 02745 | 0°0027604 
15455 | —1-861 | 0°0706097| 22839 |00313727 
159°55 | —0-947 | 0°2547828| 8°2412]0°1718219 
16455 | —0°034 | 0°3987070| 12°8966 | 04864387 

0°3241316 57:371 60 
169°55 0°880 | 0°2708640) 87614 |08105703 

0°1530213 27:085 27 
17455 1-794 | 0°0798081| 2:5815]0°9635916 

0-0330236 5'845 2 
179°55 2708 |.0-0101984| 0:3299 |09966152 


00032381 0 573 
18455 | 3621 | 0°0005673| 0.0183) 09998533 
0°0001467**| 0'026 0 


Total 177-000 177 


potot bt i se soe et) ay) eee 
*It is the probability P[x< 144-55}. **It is the probability P[x> 184-55). 


294 FUNDAMENTALS OF STATISTICS 


With these, we can now compute the expected frequencies for the 
different class-intervals and the ordinates at the class-boundaries in 
the manner explained above. In the tables f(r) and ®(r) are given 
for values of 7 at intervals of 0:01, while in the present case we have 
taken +=(x—4x)/s correct to 3 decimal places. For obtaining ¢(r) 
and ®(7) for these values, we have applied linear interpolation. 

The agreement between the observed and the expected series of 
frequencies would seem to be fairly good. This agreement will also 
be apparent from Fig. 10.2, where we have the fitted normal curve, 
obtained on the basis of col. (4) of Table 10.6, superimposed on the 
histogram of the observed distribution. 


15 


FREQUENCY -OENSITY 


144.55 154-55 164-55 174-55 184-55 
HEIGHT (cm) 
Fig. 10.2 Fitted normal curve together with the hist 
the height-distribution of Indian edule males (Table 6.10). i 


10.19 Importance of the normal distribution in Statistics 
The normal distribution plays a very important réle in statistical 
theory and its applications. As we have already seen, it has some 


UNIVARIATE THEORETICAL DISTRIBUTIONS 295 


very simple properties which make it compartively easy to deal with. 
Consequently, it will be a distinct advantage if in any case the 
population distribution of the variable under consideration may be 
assumed to be of the normal type. Generally, such an assumption 
is found legitimate in most cases of data arising from biological and 
psychological measurements. Under certain conditions, it can also 
be shown that the distribution of errors of observation in repeated 
measurements on a physical constant may be supposed to be normal. 
Such conditions being more or less valid in the field of manufacturing 
industry as well, most data arising there are also found to follow 
the normal law. Moreover, as we saw earlier, it serves as an 
approximation to the binomial and Poisson distributions, under 
certain conditions. Also, the sampling distributions of many 
statistics follow the normal form either exactly or approximately. 
(Vide Chapters 15, 17, 18, and 19.) 

It should not, however, be supposed that the normal distribution 
is to be expected in all cases of continuous data. In fact, many 
distributions may be observed in practice, specially in the field of 
economics, which deviate markedly from the normal law. Even in 
some of these cases, the normal distribution may be used as a first 
approximation, and conclusions arrived at in this way will be found 
virtually the same as those obtained by using the exact distribution. 
In some other cases, a transformation of the variable (like the 
logarithmic transformation used on economic data and discussed in 
Section 10.20) will make the distribution very nearly normal. In case 
none of the above procedures is feasible, one will, of course, have to 
make use of other theoretical distributions, e.g. the Pearsonian system 
of curves, Edgeworth’s series, Gram-Charlier type A series, etc, 


10.20 Log-normal distribution 

The variable x is said to have a log-normal distribution if Inx (or 
log x) is normally distributed. As In x varies from — oo to oo, here x 
varies from 0 to co. If Inx has mean Iné and standard deviation 5, 
then it has the p.d.f. f (Inx) such that 


f(lnx) d(lnx) = sin 


exp[ —(Inx—In £)*/28*}d(Inz), 
—o<Inx<oo, .., (10.70) 


296 FUNDAMENTALS OF STATISTICS 


and hence the p.d.f. of x is 


S= ggl (Inn £)/28°], 
oara S (10,71) 
Clearly; since Ing is the median of the distribution of In x and 
Inx is a monotonic function of x, the median of the distribution 
of x is £. 
Also, the ry Wont is unimodal, since 


af (x) erl- (ins—Ingy/25"y{— -5 (Inx—Ing)} 


a gl- (In x—In £)?/28"] 
wi sql nxin) 5° (Ins—Ing) +4), 
which vanishes, apart from doing so at x=0 and for x>, at 
x=€exp[—8*]. i.. (10.72) 


(10.72) gives the unique mode of the distribution. 
The rth moment about 0 is 


Pip SAR ) = =F sega (In x—In €)#/28?]dx” 
x Í E exp rèu] -expl =ut/2]du (substituting peat =u 


or x=£exp[du], so that = du) 


= "exp[r*5#/2] f ee exp{—(u—r8)2/2]du 


=§! exp[r? 52/2). «+ (10.73) 
In particular, the mean of the distribution is 
d p=E(x)=¢ exp[5*/2] =éw (say), +» ( (10.74) 
and ps =€ exp[25], ses (10.75) 
so that a= Hg —,*=E* exp[5*] (exp[5*]— 1) 


=fw*(w*—1), vs (10.76) 


UNIVARIATE THEORETIOAL DISTRIBUTIONS 297 


Similarly, 

Hy =€%w?(w*—1)*(w*+2), .. (10,77) 
and p= t'w (w — 1) (w4 26+ 3wt—3) (10.78) - 
Hence 

yı=V w — l (w+ 2), ... (10.79) 
and ya= (w1) (wt +3wt- 6w? +6). ws (10.80) 


Thus the distribution is seen to be positively skew and leptokurtic. 


10.21 Generalised systems of frequency curves 

The failure of the normal distribution to fit many distributions 
encountered in practice necessitated the development of generalised 
systems of frequency curves. The first approach, due to Karl 
Pearson, seeks to obtain a family of curves, known as Pearsonian 
curves, which would satisfactorily represent almost all practical 
distributions. The second approach, due to Bruns, Gram, Charlier 
and Edgeworth, seeks to represent a given density function as a 
linear combination of a simple density function and its derivatives. 
Thus if (x) is the normal density function and ¢'(x) its rth 
derivative, the Gram-Charlier type A series has the form 


F(x) =6(x) + ead’ (x) Heap) +e beep" (x). (10.81) 
The third approach, due to Edgeworth and others, seeks a trans- 
formation of the variable x so that the transformed variable has, at 
least approximately, a simple (say a normal) distribution. The last 
two approaches are beyond the scope of the present book. We shall, 
however, give a brief account of the Pearsonian system of frequency 
curves. 

It is found that observed frequency distributions for homogeneous 
populations have (i) a single mode and (ii) a high-order contact 
with the x-axis at the extremities, Thus if f(x) be the probability- 
density function representing such a distribution, then 


Sao at x=g, the mode, and also when f=0. ... (10.82) 
A differential equation satisfying these conditions is 


fa oe ve (10.82a) 


It is not necessary to take in the denominator of the right-hand side 


298 FUNDAMENTALS OF STATISTIOS 


terms beyond 5,x*, because the differential equation (10,82) is found 
adequate in providing curves of varying shapes and forms. 
For a general solution of the Pearsonian differential equation, it 
is written in the form 
df__ _(x—a)dx 
Sf > boFbr tbn 
Integrating both sides, we have 


In f= (x—a)dx +InC, 


bot byx-+ box? 
InC being the constant of integration, so that 
S (x—a)dx 10.83 
s Cool arratia | 0.88) 


The explicit form of the function (10.83) depends upon the integral 
in the exponent, which again depends upon the roots of the 
quadratic equation 6)+b,x-+h,x*=0 or, in other words, upon the 
values of the constants. A brief description of the important Pearso- 
nian types is given below : 

. Type I. This curve is obtained when the roots of the quadratic 
are real and of opposite signs. This happens when bọ and b, are of 
opposite signs. Writing 


K 


(10.84) 


= 
dbdg 


we may say that we get the Type I curve when «<0. 
The solution of the differential equation gives the Type I curve as 


Ca (a i ed —a,<x<a, (10.85) 


with origin at mode, where m,/a;=me/a 9. 
The curve reduces to the beta form under the substitution 


when the equation to the curve becomes 
Pe ler NOY) Pift)" 
s= mr mats Ae, OSE s (10.86) 


The curve is bell-shaped, J-shaped or U-shaped, according as 
both the constants m, and m, are positive, or only one is positive 
and the other negative, or both are negative. 


UNIVARIATE THEORETICAL DISTRI BUTIONS 299 


Type VI. The curve is obtained when the roots are real and are 
of the same sign. This occurs when b, and bẹ are of the same sign 
and b,? > 49d, or, in other words, when x> 1. The probability- 
density function is 

S(x)=C(x—a)'tx 4, a<x<m, e (10.87) 
the origin being a units before the start of the curve. 

The distribution reduces to the beta form under the transforma- 
tion z=a/x. The curve is bell-shaped if g, is positive and J-shaped 
if qa is negative. 

Type IV.’ This curve is obtained when the roots are imaginary 
or when 6,2< 4b by, ie. when O<x« <1. The equation to the 
curve is 


f(s) =0(14+%) “exp[ rtan], —~o<x<o, .. (10.88) 


with origin va/(2m—2) units above the mean. 

The curve is always bell-shaped. 

Type III. This is a transition type and is obtained when 6,=0 
or, in other words, when x->-L00. The curve is 


f(s) =Cexpl—yx) (1+) RES ER E «.. (10,89) 


with origin at mode. \ 
The distribution can be changed into the gamma form by using 
the transformation z=y(x+a), when the density function reduces to 


1 ás. . 
f= Fost rl 0<z<m, (10.90) 
where p=ya. 
The curve is bell-shaped if ya=p is positive and J-shaped if p is 
negative, p 
Type V. This transition type is obtained when the roots are 
equal, i.e. when 6,2=4bgh, or x=1. The curve has the equation 
S (x) =Cx-Pexp[—y/x], 9<*<% (10.91) 
with origin at the start of the curve. SI 
It is transformed into the gamma form under the substitution 
ylz=z and is always bell-shaped. 
? 


300 FUNDAMENTALS OF STATISTICS 


Type II. This is obtained when b,=0, and bọ and 6, are of 
opposite signs. The equation to the curve is 


seal)", -a<e<e (10.92) 


with origin at mean. 
Obviously, the curve is symmetrical about the origin and is bell- 
shaped or U-shaped, according as m is positive or negative. This 


P x? 
reduces to the beta form under the transformation z=1— E 


Type VII This occurs when b,=0, and by and b, are of the 
same sign. The equation to the curve is 


joc E hs ie 489-99) 


the origin being at the mean, 
This is also symmetrical about the origin and is transformed 


into the beta form under the substitution z= (1 +5). ! This is 


always bell-shaped. 

The normai curve is also a transition type of the Pearsonian family 
and is obtained when b,=b,=0. 

It can be shown that bẹ b, and bp, and hence x, can be expressed 
in terms of 8, and fẹ. Thus the curves of the Pearsonian family can 
be specified by the £, and f, criteria. 

Writing the differential equation in the form 

xf ak 
-= t 
De eg EE (origin at mode, &), 
we have 
rh ae ea eV á i 
dx* ates bist box ) (bot Bet EAN ba) tbo) 

Thus each curve of the Pearsonian family has two points of 

inflection, given by T 


=i pti vas (10.94) 


which are equidistant from the mode. 


UNIVARIATE THEORETIOAL DISTRIBUTIONS 301 
Questions and exercises 


10.1 Explain the meaning and utility of theoretical distributions, 
and indicate the relevance of probability-mass and probability- 
density functions. 


10.2 Derive the hypergeometric distribution from a suitable 
probability model. Also obtain its mean and s.d, 


10.3 Derive the binomial distribution from a suitable probability 
model. Also indicate how this may be looked upon as a limiting 
form of the hypergeometric distribution. Obtain the mean and the 
s.d. of the distribution. 


10.4 Derive the Poisson distribution from a suitable probability 
model and also as a limiting form of the binomial distribution. Give 
examples of data for which the Poisson distribution is expected to 
give a good fit. 


10.5 Show that the normal distribution may be looked upon 
as a limiting form of the binomial and Poisson distributions. What 
are the important properties of this distribution ? Account for 
the importance of the normal distribution in statistical theory and 
practice, 


10.6 Determine the modes of the binomial and Poisson distri- 
butions. Show that the mode coincides with the mean when mp or 
A (as the case may be) is an integer. 

Partial ans. The modes are the highest integers contained in 
(m+1) p and 2. 


10.7 Let the intensity of accident-proneness, A, of workmen 


follow a gamma distribution with p.d.f. f = Pln] Ae, 


0<A<oo, and let the number of accidents made by a workman 
whose intensity of accident-proneness is À follow a Poisson distribution 
A* 
: b ery Xie Ls E, - Sh 
with p.m.f. p(x|A) =exp[ algi x=0, 1, 2 ow that the, 


number of accidents x, made by a workman of unknown accident- 
proneness, follows a negative binomial distribution. 


302 FUNDAMENTALS OF STATISTIOS 
10.8 Show that the cumulative probability of the binomial distri- 


bution may be expressed in the form 


Pree he 1 Sie Pa 
ELP: “aoe FH * k-1(1 —z)* dz 


and that of the Poisson distribution in the form 
oe 


k AG PY Se 1 D k 
2 0rPl Ma FEF) feel z)z*dz. 
` 


10.9 Obtain the moment-generating function of the negative 
binomial distribution and hence determine its first four moments. 


10.10 The Pascal distribution is defined by 
f(x)= M5 Feo ) 
ltl ty 


Find the mean and variance of the distribution. 


x 
> 
where p>0. 


10.11 Suppose 5% of the inhabitants of Calcutta are cricket fans. 
Determine approximately the probability that a sample of 100 
inhabitants will contain at least 8 cricket fans ? Ans. 0-88. 

10.12 The probability of getting no misprint in a page of a book 
is 0-14. What is the probability that a page contains more than 2 
misprints? (State the assumption you make.) 

‘Ans. 0.31 (under proper assumption). 

10.13 A Poisson distribution has a double mode at x=2 and 
x=3. What is the probability that x will have one or the other of 
the two values ? Ans. 0°224. 

10.14 Starting from an appropriate differential equation, obtain 
the curves of the Pearsonian system. Discuss their important 
properties. 

10.15 Show that for a symmetrical probability distribution 
‘either discrete or continuous), all odd-order central moments are 

equal to zero. 

10.16 A continuous random variable x having values only 
between 0 and 4 has the density function f(x)=}—a*- Evaluate a. 


UNIVARIATE THEORETICAL DISTRIBUTIONS 303 


10.17 Find the mean and variance of each of the following 
continuous probability distributions : 

(i) f(x) =aexp(—ax), x>0 anda>0; 
(ii) f(x) =Jexp(—|x|), -o<x<e. 

10.18 The life (in hours) of electronic tubes of a certain type is 
supposed to be normally distributed with 4 =155 hr. and c=19 hr. 
What is the probability that the life of a tube will be 

(a) between 136 hr. and 174 hr. ? 
(b) between 117 hr. and 193 hr, ? 
(c) less than 117 hr. ? 

(d) more than 193 hr. ? 

If a sample of 200 tubes is taken, how many are expected to be 
in each of the above groups ? 

Partial ans. The probabilities are : 
(a) 0:68 ; (b) 0-96 ; (c) 0:02; (d) 0-02. 

10. 19 The results of a particular examination are shown below 
in summary form : 


Result | Percentage of candidates 
Passed with distinction PRET 
Passed without distinction 42 
Failed 43 


Total | 100 


It is known that a candidate gets plucked if he obtains less than 

40 marks (out of 100), while he must obtain at least 75 marks in 

order to pass with distinction. Hence determine the. mean and s,d. 
of the distribution of marks, assuming it is of the normal type. 

Ans. p=45-09; g=28-86. 


10.20 Show that the mean deviation about mean of a normal 


distribution is 4/ 2 o, o being the s.d. of the distribution. 
m 


304 FUNDAMENTALS OF STATISTIOS 


10.21 If logx is normally distributed with p»=1 and o?=4, find 
P}<x<2]. Ans. 0-106. 


10.22 There are 600 commerce students in the post-graduate 
classes of a university, and the probability for any student to need 
a copy ofa particular text-book from the university library on any 
day is 0:05. How many copies of the book should be kept in the 
university library so that the probability may be greater than 0:90 
that none of the students needing a copy from the library has to 
come back disappointed ? (Use the normal approximation to the 
binomial probability law.) Ans. At least 37 copies. 


10.23 Suppose the life-time (in hours) of a radio tube of a 
certain type obeys the exponential law f (x)= 4 expl], x>0, 


with A-=900. A company producing tubes wishes to guarantee for 
the articles a certain life time. For how many hours should the tube 
be guaranteed to function to achieve a probability of 0-90 that 
it will function for (at least) the number of hours guaranteed. 

r Ans. 95 hours. 


1024 For the continuous probability distribution 


f(x) =aexp[—a(x—8)], O<x<00, 


where a>0, find the moment-generating function. Obtain the mean, 
variance, B, and £, of the distribution. 


10.25 In the course of an experiment, 15 mosquitoes were put in 
each of 120 jars and were next subjected to a dose of D.D.T. After 
4 hours the number alive in each jar was counted and the following 
frequency distribution was obtained : 


No. of mosquitoes alive | 9 1 2 3 4 5 6 7 8 


a aa 


Frequency (no. of jars) |2 12 14 22 28 17 18 lo 2 
| 


Find the frequencies that one would expect on the assumption 
that cach mosquito has a common probability of survival. 


UNIVARIATE THEORETIOAL DISTRIBUTIONS 305 


10.26 When the first proof of a book containing 250 pages was 
read, the following distribution of misprints was obtained : 


ee ee eae 


No. of misprints 


per page saree 
0 139 
1 76 
2 28 
3 4 
4 
5 1 
Total 250 


Fit a Poisson distribution to the above data. 


10.27 A telephone switch-board handles 720 calls on the average 
during a rush hour. The board can make 15 connections per 
minute, Estimate the probability that the board will be overtaxed 
during any minute in the rush hour. Ans. 0:156. 


10.28 The following distribution relates to the number of acci- 
dents to 647 women working on H.E. (high explosive) shells during 
a 5-week period (given by Greenwood and Yule in 7.R.S.S., 1920). 
Show that a negative binomial distribution, rather than a Poisson 
distribution, gives a very good fit to the data. How would you 
explain this ? b 


Number of accidents a) WO aa A Fe 
Frequency 447 132 42 21 3 2 


Hint: Refer to the result of Exercise 10.7. 


10.29 The following is the frequency distribution of right-hand 
grip for 345 European males ; 


xa (1)—20 


306 FUNDAMENTALS OF STATISTIOS 


eign grip Frequency 

29'5— 395 1 
39°5— 495 2 
49:5— 59-5 12 
59-5— 695 52 
69:5— 79-5 99 
79:5— 89-5 101 
605005 55 
99:5—109'5 17 
1095—1195 5 
1195—1295 1 

Total 345 


Find the expected frequencies for the above classes assuming that 
` the population distribution of right-hand grip is normal. Draw the 
fitted curve and the histogram of the observed distribution on the 
same graph paper. 

10.30 A car hire firm has two cars, which are hired out by the 
day. It has been found that the number of demands for cars of the 
firm on any day has a Poisson distribution with mean 1-5. 

(a) Calculate the proportion of days on which neither car is 
used and the proportion of days on which some demand is refused. 

(b) If the two cars are used an equal number of times on the 
average, on what proportion of days is a given one of the cars not 
in use ? ; 

(c) How many cars should the firm have so as to meet all 
demands on approximately 98% of days ? 

Ans. (a) 0:223, 0-191 ; (b) 0-390; (c) 4. 


SUGGESTED READING 


[1] Elderton, W. P. and Johnson, N. L. Systems of Frequency Curves. 
Cambridge University Press, 1969. 


UNIVARIATE THEORETICAL DISTRIBUTIONS 307 


[2] Feller, W. An Introduction to Probability Theory and Its Applications, 
Vol. I (Chs. 6—7). John Wiley, 1968, and Wiley Eastern, 1972. 

[3] Hald, A. Statistical Theory with Engineering Applications (Chs. 
5—7). John Wiley, 1952. 

[4] Kendall, M. G. and Stuart, A. Advanced Theory of Statistics, 
Vol. 1 (Chs. 5, 6). Charles Griffin, 158. 

[5] Mood, A. M., Graybill, F. A. and Boes, D. C. Introduction to the 
Theory of Statistics (hs. 2, 3). McGraw-Hill, 1974, and 
Kogakusha. 

[6] Parzen, E. Modern Probability Theory"and Its Applications (Chs. 
3—6). John Wiley, 1960, and Wiley Eastern, 1972. 

[7] Weatherburn, C. E. A First Course in Mathematical Statistics 
(Ch. 3). Cambridge University Press, 1947. 


iE l JOINT DISTRIBUTIONS 
OF ATTRIBUTES 


11.1 Data on two or more attributes 

In some investigations, it may be appropriate to collect data, for ` 
a given set of individuals, on more than one character at the same 
time. The object here would be to look for any relationship that 
may obtain among the characters. In this chapter we shall be 
concerned with data on several attributes. The case of several variables 
will be taken up in Chapters 12 and 13, (There may, of course, be a 
third situation, where some of the characters are attributes and the 
others are variables. But this case can be dealt with by suitably 
adapting the methods appropriate for the first two cases and hence 
will not be discussed separately.) 

Thus the data may relate to the proficiency in English and the 
proficiency in mathematics of a group of high-school students (for 
each attribute there being, say, five classes: very good/good/ 
mediocre/bad/very bad); or to the subject-matter (fiction/non- 
fiction) and the readability (easy-readir g/difficult-reading) of a 
number of books ; or to the sex, the economic status (rich/middle- 
class/poor) and the level of education (illiterate/primary/high-school/ 
university) of a group of adults. 

In Table 11.1 data on two attributes are presented in a summary 
form. The figure in each cell stands for the number of individuals 
(i.e. the frequency) corresponding to a pair of forms of the two 
attributes. Thus, e.g., 19 is the number of adults attacked with 
fever among those administered quinine during the period, 193 the 
number of adults attacked with fever among those not administered 
quinine, and so on. The cell-frequencies, together with their grand 
total, give the joint (frequency) distribution of the attributes, because 
they show how the two attributes vary jointly in the given group of 
individuals. From the joint distribution, we also obtain two other 
types of distribution. Thus the row-totals (marginal frequencies), 
together with the grand total, give the distribution of the attribute 


308 


JOINT DISTRIBUTIONS OF ATTRIBUTES 309 


‘precautionary measure’, to be called the marginal frequency distribution 
of precautionary measure in the present context. On the other hand, 
the column-totals, together with the grand total, give the marginal 
distribution of ‘outcome’. The other type of distribution is given by 
each column or each row of frequencies of the table, together with 
the corresponding column or row-total. Take, e.g., the frequencies 
in the first row, together with the row-total 606. For these frequen- 
cies the form of the first attribute, ‘precautionary measure’, is the 
same but the form of the second varies. As such, it is said to give 
a conditional frequency distribution—the conditional distribution of 
‘outcome’ for the form ‘quinine used’ of precautionary measure. 
Similarly, the second row gives the conditional distribut on of out- 
come for the form ‘no quinine used’ of precautionary measure. The 
frequencies in the first column and those in the second, in the same 
way, give the conditional distributions of the attribute ‘precautionary 
measure’ for the forms ‘attacked with malaria’ and ‘not attacked 
with malaria’, respectively, of the attribute ‘outcome’. 


TABLE 11.1 
Dara on THE UsE or QUININE anv IncrpEnom or MALARIA 
COLLEOTED IN AN INVESTIGATION IN A STATE OF INDIA 
(Each figure relates to the number of adults in each 
category among a total of 3,540 adults) 


Outcome (A) 


Attacked with Not attacked 
malaria (4) | with malaria (a) 


Total 


- 


19 587 606 
(fap) | (Jan) (fB) 
3 
z 193 2,741 2,994 
i A | No quinine used (8) (fas) | (fas) | (fp) 
212 3,328 3,540 
Total (fa) | (fa) (n) 


Clearly, the attributes need not have just two forms each ; i.e., 
the table need not be a 2X2 table. Thus in Table 10.2 we have 
data on two attributes each of which has three forms. 

In each case, we might consider the relative frequencies, instead of 
the frequencies, which would also give the distributions—joint, margi- 
nal or conditional—of the attributes, although in a different form. 


310 FUNDAMENTALS OF STATISTICS 


11.2 Independence and association* 

Consider again two attributes, A and B. In the 2x2 case, the 
two forms of A may be denoted by A (the ‘positive’ form, indi- 
cating the presence of the character A) and æ (the ‘negative’ form, 
indicating the absence of the character A) and, similarly, the two 
forms of B may be denoted by B and 8. The four cell-frequencies 
may be denoted by f4, 49: fap and fag, and the total by n. Also, 
the (marginal) frequencies for the A-classes may be denoted by f} 
and f,, and the (marginal) frequencies for the B-classes by fp and fg. 
Thus 


Sa=Santtas Sa=fentSans } 


(11.1) 
Jo=fant fan Soe =f antf aer 


and n=fantfagtfanttan eat LA 2a) 
=fatte w+ (11.2b) 
=fr+fe sesa (l1.2c) 


Suppose the individuals under consideration constitute the popu- 
lation itself and not just a sample from the population. Also suppose 
that none of the marginal frequencies is zero, Then the ratios 
Sanlf4and fanl fa give, respectively, the proportions of members of 
the population having B, among those having A and among those 
having œ. If these proportions be equal, we may say that the 
presence or absence of the character A in an individual does not in 
any way determine whether B will be present. A and B may then be 
called statistically unrelated or independent. As opposed to the notion 
of independence, there is the notion of association. Thus A and B 
are said to be associated if they are not independent. 

We have seen that, for A and B to be independent, we must have 


San Sus, ae AES) 
Se 
This implies 


Lan Sant+San_ So 


ta Atte n 


or fu Tale, ws (11.4a) 


*The ideas in this section are comparable to those in Section 3.6. 


JOINT DISTRIBUTIONS OF ATTRIBUTES 311 


Actually, (11.3) also implies 


Ja Ahs, we (11.4b) 
Sasa Labe vs (114) 
and funn Ee. ve (114d) 


Since equation (11.4a) itself leads to (11.4b), (11.4c) and (11.4), 
and also to (11.3), it is taken as the defining equation for the 
independence of A and B. This is done irrespective of whether f4 
and/or fg is zero. 

Suppose A and B are not independent, i.e. are associated. We 
may distinguish two cases. (i) If 


ESTUA E a ii 


A and B occur together more frequently than they would have if 
they had been independent. Hence in this case the attributes are 
said to be positively associated (or, simply, associated). (ii) On the 
other hand, if 

‘ asx tale, „= (11.6) 


ie. if A and B occur together less frequently than they would have 
if they had been independent, then they are said to be negatively 
associated (or disassociated). 

As regards the definition of perfect association, we may adopt one 
of two alternatives: (a) Thus we may say that there is perfect 
positive association between A and B if all A’s are B’s andjor all B’s 
‘are A’s, ic. if f,,=0 andlor fap=0: Likewise, there may be said to 
be perfect negative association if no A’s are B’s andjor no a’s are p's, 
i.e. if f4p=0 andlor fag=0. (b) Alternatively, we may say that 
there is perfect positive association if all A’s are B’s and all B’s are 
As, i.e. if fyg=O and fap=0; and that there is perfect negative 
association if no A’s are B’s and no a’s are ’s, ie. if fag=0 and 
Sfag=0- 

To keep these two cases distinct, the association will be said to 
be complete (positive or negative) in the first case and to be absolute 
in the second. 


312 FUNDAMENTALS OF STATISTIOS 


11.3 Measures of association for the 2 x2 case 

We shall consider measures of the extent to which A and B, each 
of which occurs in two possible forms, may be said to be associated. 
Clearly, there are certain desiderata that such a measure should 
fulfil. For one thing, it should be independent of the total frequency 
n, just as the mean or the moments of a variable are, and thus depend 
on the relative frequencies in the cells rather than on the frequencies. 
Secondly, it should be zero in the case of independence, negative in 
the case of negative association and positive in the case of positive 
association. Thirdly, it should increase from its lowest possible 
value through zero to its highest possible value as we proceed from 
perfect negative association through independence to perfect positive 
association. Lastly, it should preferably vary between two definite 
limits, like —1 and +1. 

Obviously, the difference 


Sup fan L4L, eee Verse) 


between the actual frequency for the cell (A, B) and the value that it 

should assume if A and B are independent, may serve as the basis for 

such a measure. Keeping all the desiderata in mind, on@may use 
nô4p 


Qae ot T (11.8) 
Ai, fe —f, Se 
Ne fetta ton se 9) 


as a measure of association. It has been called the coefficient of 
association between A and B and is due to Yule. It may be seen that 
Q satisfies all the desiderata stated above. In particular, Q ,,=-0 if 
and only if 54,=0, i.e. ifand only if A and B are independent, Its 
lowest possible value (—1) occurs when fg fog=0, ic. when f4,=0 
and/or f,,=0, i.e. when there is complete negative association between 
A and B. Likewise, its highest possible value (+1) occurs when 
there is complete positive association between A and B. 
A measure having the same general properties as Q 4, is the 
coefficient of colligation Y 4p, also due to Y ule and defined by 


(11.10) 


JOINT DISTRIBUTIONS OF ATTRIBUTES 313 


There is yet a third measure, viz. 


Vaca danS as fae Jan, =... (11.11) 
Vfafafasfe  Viafefote 
This bas properties similar to those of Q and Y, but unlike Qand Y, 
V=41 when and only when there is absolute association between the 
two characters. 

To prove this result, let us use the symbols a, b, ¢ and d for fam 

Sag» fan and fag, respectively. Then 
Eia ad—be SIEN 
CEGO "73 
and this equals 4-1 if and only if 
(ad—be)*==(a+6)(c+-d)(a+e) (b+), 
i.e. if and only if 
a*(be+-bd-+-cd) +-6?(ac-+ad-+cd) +-¢?(ab+-ad-+-b4) 
+d*(ac+ab+-be)+4abed=0. wee (11.12) 

But this expression can vanish only if at least two of the non- 
negative quantities a, b, c and d vanish. We assumed, however, 
that the marginal frequencies are all non-zero, precluding the cases 
a=b=0, c=d=0, a=c=0, and b=d=0. Hence V4,=+1 if and only 
if b=c—0 or a=d=0. In the former case there is absolute positive 
association between A and B and V,,=+-1, while in the latter there 
is absolute negative association and V,,=—1. 

Example 11.1 For the data of Table 11.1, let us denote by A the 
attribute ‘outcome’ and by B the attribute ‘precautionary measure’. 
We then have for the data 

19x 2741 —193 x 587 _52079—113291 
Qas= 19x 2741+ 193x587 520794113291 
=—61%12/165370=—0-37015, 
52079 — V 113291 _ 228-208 — 336-587 
7520794 v 113291 228-208 5-336-587 
= —108-379/564'795 = —0-19189, 


19x 2741—193x 587 __ 52079—113291 


Van 7515 x S808 x 606 x 2954 V 125444803 x 108 
=—61212/(1 1200 x 10*) = —0-05465. 


Yas= 


while 


314 FUNDAMENTALS OF STATISTICS 


Each of the measures indicates only a slight negative association 
between the two attributes. In other words, there is only slight 
evidence in support of the belief that use of quinine is generally 
followed by exemption from attack of malaria. 

One important point is to be noted in this connection. The 
notion of independence is, by its very nature, related to a population 
and so is the notion of association. However, it is perfectly legitimate 
to study the presence or absence of association in the population 
from sample data. We may thus compute a measure of association” 
according to one of the formule given above, where n is now to 
be regarded as the sample size and the frequencies as the sample 
frequencies for the cells or the margins. As in many other cases, 
the sample measure is to be taken, at least for large n, as a good 
approximation to the corre:ponding population value. Indeed, the 
data of Table 11.1 are, more appropriately, to be considered to be 
sample data for a random sample of size 3,540 taken from the 
population of all adults in the givea Indian State. 


11.4 Manifold two-way (kx 1) classification 

We shall now discuss cases where again there are two attributes, 
but at least one of which occurs in more than two forms. A two-way 
classification ¿f this type is given in Table 11.2. The data relate to 
830 prefessional workers living in Indian towns and cities, who were 
interviewed during a survey. 

TABLE 11,2 
CLASSIFICATION OF 830 PROFESSIONAL WORKERS ACCORDING 
TO Ocoupation GROUP AND AOTIVITY STATUS 


ec fae 1 in A SSS Ar «OY f, 
\ Activity status 
Ow: Total 
Employees | Employers | Coase S 


f 
} 
| 


Scientists | 
and technicians 16a | 2 140 


Medical | 
and health services 83 25 68 | 176 
j 


Teachers 286 | 10 / 28 324 
| 


si Total 538 56 236 
otal | | | 830 


Occupation group 


JOINT DISTRIBUTIONS OF ATTRIBUTES 315 


The two attributes may again be denoted by A and B. Let A 
occur in one of k forms: A, Ay; =- „A; ; and let B occur in one 
of J forms: By, Bg, = , B; Suppose, of the n individuals under 
study, f;; have the form A; of A together with the form B; of B. 
Then fi; is the cell-frequency of the (i, j)th cell or of the combination 
A,B; 5 


hewitt (im E nnedsrosk) vee (11.18) 
is the marginal feed none of A; : and 
f= Bu a ey we (11,14) 
is the marginal frequency of B; Of course, 
n=} fo we (11.15a) 
=Z fo ... (11.15b) 
=F fi: we» (11.15c) 


The frequencies fi; (together with n) define the joint distribution 
of A and B; fp define the marginal distribution of A and fo; the 
marginal distribution of B. The k conditional distributions of B for 
given forms of A are represented by the & columns of the two-way 
frequency table, while the / conditional distributions of A for given 
forms of B are represented by the / rows of the table. 

As in the 2x2 case, here too we shall assume that 

fio > foreach i 


and Soi > 0 for each j. 
In this case, we may consider A and B to be unrelated or statisti- 
cally independent if 
Giad inet 
fio Sao Sro 
for each j or, equivalently, if 
fy= tele (all i, j). ws (11.16) 


The equations in (11.16) are used to verify that the two attributes 
are really independent. Note that of these kl only (k—1)(/—1) 
are algebraically independent equations, there being a number of 
constraints, as implied by (11.13)—(11.15). 


316 FUNDAMENTALS OF STATISTIOS 
If, on the other hand, 
Sue taly poy (dd?) 


for any pair (i,j), then A and B will be said to be associated. 
It should be realised that in a manifold two-way ‘classification 
generally it will not be meaningful to make a distinction between 
positive and negative association. However, this distinction will 
have a meaning in case the classification with respect to each attri- 
bute involves an implied ranking—when, e.g., a group of students 
is classified according to, say, proficiency in Subject I and in Subject 
II into five categories each : very good, good, mediocre, bad and 
poor. If students who are very good in Subject I are also generally 
found to be very good in Subject II, those who are good in Subject 
J are generally good in Subject II, and so on, then the two attributes 
may be said to be positively associated. If on the other hand, those 
who are very good in Subject I are generally poor in Subject II, 
those who are good in Subject I are generally bad in Subject IJ, 


and so on, then the attributes may be considered to be negatively 
associated, 


As in the 2x2 case, here too, in constructing a measure of asso- 
ciation we shal] make use of the differences 


by=fy—Lole, , Aik (11,18) 


between the actual cell-frequencies and the values they should 


assume if the characters A and B are, in fact, independent. The 
quantity 


b= Le ts 
ben haggle, 4 as 
"DES il Sofan ss (11.20) 


may serve as a measure, This is zero if and only if A and B are 
independent (i.e. if and only if 6,,=-0 for all i,j), and the higher the 
strength of association, the higher is the value of X4p. However, Xip 
depends too much on the total frequency n, and theoretically it can 
be infinitely large. A measure that does not suffer from this defect 


JOINT DISTRIBUTIONS OF ATTRIBUTES 317 
is Karl Pearson’s coefficient of contingency, 


Cig= Xie, see ) 
AB "FXI; (11.21) 


C 4p equals zero if and only if ,;=0 for each i, j (i.e. if and only if 
X32=0). However, it has the defect that its least upper bound is less 
than unity. (For yj, > 0 and n> 0, so that x3; < n-+-y4, and hence 
necessarily C4,;<1.) In other words, it does not attain the value 
unity even if A and B are perfectly associated. 


Consider, e.g., a table 
with & classes for each of the two attributes, where every diagonal 


frequency fu > 0 and f,;=0 for every non-diagonal cell. Surely, no 
greater degree of association than this can be imagined in the kx k 
case. Yet, since f,;=0 for ij and fii=fig= fyi, 
PE Pee ep 
sA 227  Soj 4 
pil Siu 
=n = 
Fr X foi 5 
=n(k— 1 } 


ee ao 
Cis=4] E= sjén. 
ti VE ) k 
To remove the stated defect, Tschuprow suggests an alternative 
coefficient : 


and so 


T= ju (11.22) 


avVN 
Like C45, T4 vanishes if and only if 4 and B are independent. 
But unlike C4,, if the attributes 4 and B are perfectly associated in 
a k Xk table, then ee A 
ta oo =] 
However, not much is known about the behaviour of T 4p in the 
case of k xI tables with kl. 


Example 11.2 Consider the data of Table 11,2, Here for the two 
attributes, say A and B, à 


(169)? (83)? , (286): (28 
Xin =830 559 30 * 538x176 * 538x394 i ta 1) 


318 FUNDAMENTALS OF STATISTIOS 


TABLE 11.3 
COMPUTATION oF MEASURES OF AssooraTION 
FOR THE Data or Taste 11.2 


fä fioX foj FDI Sio X foj) 
169 177540 0'16087 
83 94688 007275 
286 174312 046925 
21 18480 002386 
25 9856 006341 
10 18144 0:00551 
140 77880 0:25167 
68 41536 011133 
28 76464 001025 
Total - 1°16890 


From the above table, we get 
xp =830 x 0'16890 = 140-187, 


so that Cas A] ea 1 eg VOTRA 444 


` =0:38012, 
and Tas =y 30x27 V 008485 


Both indicate a moderate degree of association between the 
attributes. 


11.5 Case of more than two attributes 

When data are collected simultaneously on more than two 
attributes, one has to use suitable modifications and extensions of 
the principles set forth in the preceding sections. In Exercise 11.12 
we have a set of data of this type. We shall use a notation 
similar to that used in the case of two attributes. Thus, eg., if 


JOINT DISTRIBUTIONS OF ATTRIBUTES 319 


we have three attributes, A (having r forms: Aj, Ap +... 1 AG); 
B (having s forms : By, By, ..... » B,) and C (having t forms: Ci, Ca 
ives’ s Cı), we shall denote by fij the frequency in the (i, j, k)th cell 
and by n the total frequency. Also, we shall write 


Sioo= ZE fin Soio= ZX fir and fook SEZ Sfin i] 
Sin™ Sit Son = Buia and for =F fijn- f 


Besides the joint distribution of 4, B and C, defined by /;;;, here 
we shall have two kinds of marginal distribution. In the first group 
wili be the marginal distributions of 4, B and C, defined respectively 
by fios (for varying i), fijo and foo ; to the second group will belong 
the marginal distributions of A and B, of A and C and of B and C 
given, respectively by fijo, fiog and Joji- There will also be two kinds 
of conditional distributions. First, we shall have the conditional 
distributions of A for given forms of B and C, of B for given forms of 
A and C and of C for given forms of A and B. Secondly, we shall 
have the conditional distributions of A and B for given forms of C, of 
A and C for given forms of B, and of B and C for given forms of A. 

To discuss the types of problem that may arise when the data 
relate to more than two attributes, we need consider the case of 


(11.23) 


three attributes only. 
We may, first of all, want to investigate whether the attributes 


may be supposed to be mutually independent. The attributes, A, B and 
C, e.g., will be said to be mutually independent if the equations 


aT E N we (11.24) 


hold for all i, j and k. Not all these are algebraically independent 
equations. The number of independent equations is (r—1)(s—1) 
+ (r—1)(t—1) + (5—1), (QI) +01) (s—1) (t—1)=rst—r-s—t+2. 
E.g., if each attribute has just two forms—A and a, B and $, and G 
and y—then the following 4 equations would be enough to ensure 


that A, B and C are independent attributes : 


fafan } 

Sac=Safelr 

Jsc™Ss Sel” t ig ee 
and Sasc=Sats feln’, j 


320 FUNDAMENTALS OF STATISTIOS 


where the symbols conform to the notation used in Section 11.2. As 
a measure of the joint association of the attributes, i.e. of the departure 
from mutual independence, we may use a variant of C or 7, e.g. 


Caril abe vee (11.26 
as Nes ( ) 
where 


xisc= ZZZ (fi Ki Lafo onr)" [Loa buiabos h 


However, more commonly one will want to examine to what 
extent one attribute, considered to be of special importance, may be 
said to be associated with the others taken together. The attributes 
in the second group will then be lumped together, thus giving a 
two-way classification. E.g., if there be p attributes in all, each with 
2 forms, then the two-way classification will have 2 classes in one 
direction and 2°71 classess in the other, With the attributes A, B 
and C occurring, respectively, in r, s and ¢ forms, as in the preceding 
paragraph, A.will be said to be independent of or associated with B 
and C taken together, according as the following identities hold or 
do not hold simultaneously* : 


Sign = Tala em a 


As a measure of the multiple association of A with B and C, we may 
then use a variant of Pearson’s coefficient of contingency or of 
Tschuprow’s T : 


Cy, song/ hee ves (11.28) 
4.80 
ih x4, aa 
or a r, : ... (11.29) 
where Xasc=2 FÈ (fn mA Sihon) ‘| Ends 


ie Shut ssi 
"7% Pic fon e 


Stili more commonly, we shall have two attributes of primary 
importance, say A and B, considered along with some others which 


*Of these only (r—1) (st—1) are independent. 


JOINT DISTRIBUTIONS OF ATTRIBUTES 321 


may have some influence on the former. In examining the asso- 
ciation between A and B from a table like Table 11.1 or Table 11.2, 
all other characters are ignored, and hence this may be called 
total association. But we may like to take the influence of the other 
characters on both A and B into account. This will necessitate the 
measurement of the association between A and B for each combina- 
tion of forms of the other attributes. Consider, for illustration, the 
case where A and B are studied together with one other attribute C, 
each of these attributes having two forms only. Then for each of the 
forms C and y of the attribute C, we shall have to get a measure of 
association. These may be called the coefficient of partial association 
between A and B in the ‘presence’ of C and the coefficient of partial 
association between A and B in the presence of y (ie. in the ‘absence’ 
of C), and are given by the formule : 


“a Sed 4n.c 
Rave Jasc fasctSaac Janc 


Save faac—Sasc fasc ka 
py a ha gc JEn pi : (11.30a) 


e) 
d oe Sana 
i Qanı S anr apit Sasr Sony 


fanr Jagr Sas Sanr . (11.30b 
Janr Sosrttaay Janr ( ) 


where S4B.c =faBc— Lac Sac -. 


and bagn =S an mE Sox, 
fr 


To see how the influence of C on A and B may affect the associa- 
tion between them, let us write 84p.¢ and 545. in terms of 545; 84¢ 
and ĝpe (Note that the signs of the Q’s are the same as those of © 
the 8’s.) We have 


Sactno_ Sarf sr 
fe 


7 


34p.ct8ap.1 =SaB— 
=o hl Hiei 


rs (1)—21 


322 BUNDAMENTALS OF STATISTIOS 


=8,,—VrSacSactoSarSar—Sala fofr 
i toh 


njoj 


=ô848— e ( fac— Lae) (n-A) 


nfo fr 
ANE E, pa zee (F151) 
AB Tots 40°BC 
Suppose now that y 


8 42.0=84p.1=0. 
Then 

a rac Sac 
which may be a non-zero quantity. This means that A and B may 
be independent when they are studied in the presence of C and also 
when they are studied in the absence of C. But, when C is ignored, 
they may appear to be associated simply because of the association of 
C with both A and B (i.e. because C may be such that neither 5 AC 
nor pç vanishes). The association between A and B, as indicated 
by 845, may thus be completely illusory. In the same way, (11.31) 
shows that 84, may be zero even when 8 4B.c OF 842, is non-zero. 
Thus the apparent independence of A and B may also be spurious, 
being due to the effect of a third attribute (C) on them. 


11.6 Association and causal relationship 

It should be obvious to the reader that an association between 
two attributes need not imply a causal relationship. For an associa- 
tion between two attibutes, 4 and B, may be due to (a) A being a 
cause of B or (6) B being a cause of A or (c) both being caused by 
some other character or group of characters. Only in cases (a) and 
(b) is the relationship between A and B of the causal (ie. cause- 
effect) type. To make sure that situation (c) does not obtain, we 
should study A and B together with other characters, say C, D, ete., 
which are likely to have an influence on the former. That is to say, 
we should separately measure the association between A and B for 
each combination of forms of the other characters, The reason is 
that an apparent association between A and B may actually be due 
to the effect of those other characters on them, as has been seen in 
the last section. 


JOINT DISTRIBUTIONS OF ATTRIBUTES $23 


For instance, we have seen in Example 1.11 that the taking of 
quinine as a precautionary measure (A) is negatively associated with 
attack of malaria (B). Even so, it would be unwise to jump to the 
conclusion that the use of quinine provides protection against attack 
of malaria. It may be that the economic condition (C) of the 
people examined has something to do with the observed association. 
Consider a classification of the people according to C into two 
groups : rich (C) and poor (y). It is well known that rich people 
are more health-conscious than the poor and are more likely to 
afford the use of quinine as a precautionary measure. Hence A and 
G are likely to be positively associated. Again, the rich live in more 
hygienic conditions than the poor and hence are less likely to be 
attacked with malaria. Thus B and C are likely to be negative’ 
associated. And the non-zero association between A and B may 
actually be due to the non-zero association of each of them with C. 

As such, while using a measure of association in looking for a 
causal relationship between two chatacters, A and B, we should 
consider them in conjunction with other characters, C, D, etc., that 
are likely to have an effect on them. Only when A and B are found 
to be associated for fixed combinations of forms of these characters 
will it be proper to say that one of them is a cause of the other. 


11.7 Smoking and lung cancer 

The way in which the notion of association can be used in 
looking for a possible causal connection between two phenomena 
and the caution that one should exercise in interpreting the 
association between them are well illustrated by the so-called 
‘cancer controversy’. 

In the beginning of the 20th century, statisticians noted an 
alarming rise in the incidence of lung cancer. While part of the 
rising trend might be attributed to improvements in diagnosis and the 
changing size and age-composition of the population, the evidence 
left little doubt that a true increase had taken place, With the 
increase in the incidence of lung cancer, suspicions regarding the 
possible ill effects of smoking became deeper when medical men 
observed that lung cancer patients were predominantly heavy 
smokers of tobacco, 


324 FUNDAMENTALS OF STATISTICS 


It was in 1952 that a’ remarkable piece of research was carried 
out on this issue by Bradford Hill and his colleagues They picked 
from a number of hospitals in London about 1,500 patien s who 
had been diagnosed as suffering from lung cancer and data were 
collected on their smoking habits. A similar number of non-cancer 
patients from the same hospitals were asked the same set of questions 
on their smoking habits. A comparison of the two groups showed 
tat cigarette-smokers were more common among the lung cancer 
patients than among the other patients. Further, as to the cigarette- 
smokers themselves, heavy smokers were found to be more common 
among the lung cancer patients than medium or light smokers. In 
the course of the next few years, more investigations on somewhat 
similar lines were conducted in different parts of the world, and 
they all confirmed Hill’s findings. The British Medical Association 
thereupon started a crusade against smoking, demanding that the 
hazards of smoking “must be brought home to the public by all the 
modern devices of publicity”. 

However, R.A. Fisher, the celebrated British statistician, through 
a series of writings and lectures [1], deprecated this attempt on the 
part of the BMA “to plant fear in the minds of perhaps a hundred 
million smokers throughout the world—to plant it with the aid 
of all the means of modern publicity backed by public inoney”, 
Hill’s enquiry, Fisher said, had only made a good prima facie 
case against smoking. Further investigations subsequent to Hill’s 
consisted very largely of the same type of observations and hence 
suffered from similar limitations. Fisher stressed that even if one 
admitted that the findings of Hill and others had established a 
significant positive association between cigarette-smoking and lung 
cancer, it might not indicate that the former causes the latter. 
For one thing, is it possible that lung cancer (rather, the pre- 
cancerous condition involving slight chronic inflammation which 
exists for years in those who are going to show overt lung cancer) 
is a cause of smoking cigarettes? Maybe, this condition leads 
people to derive from smoking the same kind of satisfaction as 
one derives from it after a slight irritation or disappointment. 
For another (and Fisher thought this to be more likely), it may 
be that a common cause explains the observed association. The 


JOINT DISTRIBUTIONS OF ATI RIBUTES $25 


obvious common cause is the genotype. Genotypic differences in 
men are expected to lead to differences in their susceptibility to 
lung cancer. Such-differences are also likely to lead to differences 
in their smoking habits. 

Although it was insinuated from some quarters that Fisher had 
entered the controversy at the behest of the cigarette-manufacturers, 
actually Fisher was perfectly logical in pointing out that the BMA 
was being too hasty in directing its guns towards smoking in its 
fight against lung cancer. 

Anyway, it was important that the controversy should be set at 
rest and the world told the truth about the relationship between 
smoking and lung cancer. A thorough review of the investigations 
made till then came in the form of a report by the Royal College of 
Physicians of London. Their study, started in 1959 and continued 
till 1962, led to the conclusion that cigarette-smoking is indeed a 
cause of lung cancer. The same conclusion was reached in the 
course of a more thorough review made under the aegis of the 
U.S. Government. In early 1962, the U.S. Government set up 
a committee of experts—physicians, chemists, biochemists and 
statisticians—to make a thorough enquiry into the matter. Its 
report, briefly called the Surgeon-General’s Report [5], came out 
in December, 1963. 

The committee based its report principally on the findings of 29 
retrospective and 7 prospective studies that had been conducted till 
1963 in different parts of the world. In a retrospective study, the 
smoking histories of persons with a specified disease (eg. lung 
cancer) would be compared with those of persons without the 
disease. In a prospective study, on the other hand, a comparison 
would be made between the death rates of smokers and non-smokers, 
both over-all and for specific causes of death. This is done by first 
recording the smoking habits of people and then obtaining death 
certificates for those who die after entering the study. 

Now, these studies indicated that a significant association does 
exist between smoking and lung ccncer. 

‘The committee noted that, while a direct experiment in man to 
test whether a causal relationship exists between smoking and lung 
cancer is not feasible, a considerable amount of experimental work 


326 FUNDAMENTALS OF STATISTICS 


in many species of animals had shown that certain compounds 
identified in cigarette smoke can produce cancer. There are other 
substances in tobacco and smoke which, though not cancer-produ- 
cing themselves, promote cancer production. 

Second, the association was found to be consistent in the sense 
that none of the prospective studies and none but one of the retro- 
spective studies showed results to the contrary. Thus, despite many 
variations in design and method, all the retrospective studies, except 
one which dealt with females, showed that there were propor- 
tionately more cigarette-smokers among lung cancer patients than 
among people without cancer. 


Third, in the 9 retrospective studies in which relative risk ratios 
for smokers and non-smokers were calculated and in all 7 prospective 
studies, the relative risk ratios were uniformly high and fairly close 
to each other, thus attesting to the strength of the association. 
Moreover, a dose-effect phenomenon was apparent in the sense that 
the relative risk ratio was found to increase with the amount of 
tobacco or number of cigarettes consumed, 

Fourth, the suggestion that genetic influences might underlie 
both the tendency to smoke and the tendency towards lung cancer 
was also examined by the committee and was fourd to be at 
Variance with facts, For one thing, ‘the great rise in the incidence 
of lung cancer among males that has occurred in recent decades 
points to the introduction of new factors without which the genotype 
would have little or no potency. Evidently, the genetic factors 
were not strong enough to cause lung cancer in large numbers of 
people under the environmental conditions that existed, say, half 
a century ago. And the possibility that the genetic constitution 
of man has changed simultaneously and identically in a large 
number of countries is also unlikely. For another, the risk of 
developing lung cancer has been found to diminish when smoking is 
discontinued, although the genetic constitution must have remained 
the same. 

On the above considerations, the committee came to the conclu- 
sion that the high degree of association between sn oking and lung 
cancer ought to be interpreted to mean that the former is a (major) 
cause of the latter. 


JOINT DISTRIBUTIONS OF ATTRIBUTES 327 
Questions and exercises 


11.1 Given n, f4, fp and fag, how would you find the other cell- 
frequencies and marginal frequencies of the 2x2 table? 

11.2 Consider three attributes, A, B and C, each occurring in 
two forms. Given n, fas fas for San» Sac Sac and fasco how. would 
you obtain the other frequencies ? 

11.3 On the basis of the performance of a group of students 
in a high school examination, the following statement was made : 

“Of the students concerned, 48% are good in English, 22% good 
in mathematics and 55% good in elementary science ; 33% are good 
in both English and mathematics, 32% in English and elementary 
science, and 38% in mathematics and elementary science ; 30% are 
good in all three.” 

Show that the figures, as they stand, must be incorrect. 

11.4 Examine the following statement for possible contradic- 
tions : 

«Of 520 commuters interviewed at Howrah Station, 385 were 
found to be Government employees, 417 were smokers and 228 were 
in Western dress; 173 were both Government employees and 
smokers, 195 were Government employees in Western dress and 104 
were smokers in Western dress.” 

11.5 For the case of two attributes, define independence and 
association (positive and negative). What are the different measures 
of association, and what are their properties ? 

11.6 Show that the two forms of Q, (11.8) and (11.9), are 
equivalent and that Q increases monotonically as ad increases (where 
a=f4p and d=f,,), thus justifying the statement that Q changes 
from —1 through 0 to +1 as one goes from complete negative 
association through independence to complete positive association. 

11.7 Show that 

2r 
R=rFT" 
and hence that Q is greater in absolute value than Y, except when 
both are zero or $1. 


328 FUNDAMENTALS OF STATISTIOS 


11.8 What is partial association ? Explain the relevance of this 
concept to the investigation of a causal relationship between two 
attributes. 

11.9 Examine the following statement : “One in every thousand 
smokers dies of lung cancer. Hence smoking must be a cause of 
lung cancer.” 

11.10 Compute a measure of association for the following data 
and comment. 


Deatus FROM Tosrroutosis 1x Enaranp & Wares iN 1956 


Tuberculosis of respiratory system 3,534 1,319 4,853 


Other forms of tuberculosis 270 | 252 522 
Total | 3,804 1,571 5,375 


Ans. Q=0-429. 


11.11 For the following 3 x3 classification, compute a measure of 
association. Would it be proper to attach a sign to this measure ? 


Dara on 413 Mate Cotveces STUDENTS 
(Givina Resorts or a VisvaL Acurry Test anp a Batanog Test) 


Left eyed Ambiocular | Right-eyed Total 


Total 174 71 168 | 413 


Partial ans. C=0-076. 


11.12 Would you say that the observed association between 
A and B for the following data is real? Or would you ascribe it to 


JOINT DISTRIBUTIONS OF ATTRIBUTES 329 


the influence of Gon A and B? (A=type of high school education 
received, B=performance at university, C=income.) 


|Englishemedium | Others | Total 


97 135 
Successful at 
university 
61 4 
| 158 209 
4 16 141 157 
Unsucpenful at spea] | 
university 
Low 
income | 21 113 | 134 
Total 37 | 254 | 291 


Partial ans. Q „n=0378, Q4p.c=0'551, Qap.1=0.068. 


SUGGESTED READING 


[1] Fisher, R. A. Smoking, the Cancer Controversy (Some Attempts to 
Assess the Evidence). Oliver & Boyd, 1959. 

[2] Goodman, L. A. and Kruskal, W. H. ‘Measure of association”, 
Jour. Am. Stat. Assocn., 49 (1954), pp. 732-, and 54 (1959), 
pp. 123-. 

[3] Kendall, M. G. and Stuart, A. Advanced Theory of Statistics, 
Vol. II (Ch. 33). Charles Griffin, 1960. ’ 

[4] Kruskal, W. H. «Ordinal measures of association”, Jour. Am. 
Stat. Assocn., 53 (1958), pp. 814-. 

[5] U.S. Deptt. of Health, Education and Welfare. Smoking and 
Health : Report of the Advisory Committee to the Surgeon-General of 
the Public Health Service. Van Nostrand, 1964. 

[6] Yule, G. U. and Kendall, M. G. An Introduction to the Theory of 
Statistics (Chs. 1—3). Charles Griffin, 1950. 


12 BIVARIATE FREQUENCY 
DISTRIBUTIONS 


12.1 Bivariate data 

We discussed in the last chapter methods of summarisation of 
data arising out of variation in two attributes. We shall now take up 
for cunsideration the case of two variables. The variables may be 
denoted by x and y. Thus x may be the height and y the weight of 
a person, or x may be the weight when it is green and y the weight 
of dry fibre for a jute plant. Our raw data will then consist of 
a number of pairs of values of x and y, each pair corresponding to 
a particular individual. As an example, we may consider the data 
of Table 12.1, which gives, for each of 20 undergraduate students, 
the marks obtained in statistics (Honours) in a college test (full 
marks 300) and in the subsequent university examination (full 
marks 600). 
` TABLE 12.1 

Margs OBTAINED BY 20 UNDERGRADUATE STUDENTS 
IN Statistios Honours in a CoLLEGR TEST AND 
IN THE SUBSEQUENT UNIVERSITY EXAMINATION 


Marks obtained Marks obtained 
Serial No. | in college | in university | Serial No. |in college | in university 
test examination test examination 


` 123 326 
121 341 


ceo“ OU tet WH m 


= 
Cl 


$31 


BIVARIATE FREQUENOY DISTRIBUTIONS 


(sy ur) owosur Apure 


SOS ozrl E con | 0E gpg | $-006—$-082| ¢-0¢¢—$-009] ¢-009—<-04) sesen 


SaITMV 00g AOA GOOT NO INHAS HOON] 
40 ZOVINAOUAJ ANV HWOỌONJ ÐNIMOHŞ ‘FIAV, AONTAdAUY ALVIUVAIgJ V 
TZI ATAVL 


poo} uo juəds swoon jo aBeju2010g 


332 FUNDAMENTALS OF STATISTICS 


When the data are considerably numerous, they may be summa- 

rised by using a two-way frequency table. For each variable a suitable 
~ number of classes are taken, keeping in view the same considerations 
as in the univariate case. If there are k classes for a and / classes 
for y, then there will be in all kx/ cells in the two-way table. By 
going through the pairs of values of x and y, we can then find the 
number of individuals (or frequency) in each cell, perhaps using, for 
the sake of convenience in counting, the system of tally marks. The 
whole set of cell-frequencies will now define a frequency distribu- 
tion—a bivariate frequency distribution (sce Table 12 2). 

In Table 12.2, the column-totals of frequencies show the numbers 
of individuals belonging to the corresponding x-classes, irrespective of 
their y-values (where x is taken to denote family income and y the 
percentage of family income spent on food). Thus they show the 
frequency distribution of x, called the marginal distribution of x in the 
present context. Similarly, the row-totals of frequencies give the 
marginal distribution of y. Again, f we consider a particular column 
of frequencies, we find the number of individuals falling in each y-class 
for the given values of x. It may be called a conditional distribution or 
an array distribution of y for given x. Similarly, any particular row of 
frequencies gives a conditional or an array distribution of x for given y. 


MARKS IN UNIVERSITY EXAMINATION 


100 125 150 175 200 
MARKS IN COLLEGE TEST 


Fig. 12.1 Scatter diagram for the data of Table 12.1. 


BIVARIATE FREQUENCY DISTRIBUTIONS $33 


12.2 Scatter diagram 

The simplest mode of diagrammatic representation of bivariate 
data is the use of a scatter diagram (or dot diagram). ‘Taking two per- 
pendicular axes of co-ordinates, one for x and the other for y, cach 
pair of values is plotted as a point on graph paper. The whole set 
of points taken together constitutes a scatter diagram. The data of 
Table 12.1 have been represented in this way in Fig. 12.1. This 
method is, of course, not suitable when the number of individuals is 
very large. In such a case, one may use some three-dimensional ` 
analogue of either the histogram or the frequency polygon, 
constructed on the basis of a bivariate frequency table. 


12.3 Correlation 
In the bivariate case, we may analyse the data relating to each 
variable separately by using the methods discussed in Chapters 6— 
10; but here we are primarily interested in the relationship between 
the two variables and for this some new methods have to be devised. 
The problems with which we are mainly concerned may be of two 
‘types, First, the data may reveal some relationship between the two 
variables and we may want to measure the extent to which they are 
related. Secondly, there may be one variable of particular interest 
and the other, regarded as an auxilary variable, may be studied for 
its possible aid in throwing some light on the former. One is then 
interested in usiog any relationship that may be found from the 
observed data for making estimates or predictions of the principal 
variable in situations similar to the one under consideration. 
Regarding the first type of problem, the simplest case occurs 
when, from the scatter diagram or otherwise, the variables are found 
to be lineatly related, at least approximately. Here, with the 
values of one variable lying in any assigned interval, however small, 
the corresponding values of the other variable may be found to differ 
considerably, Let us take the average of these values. If it is found 
that as one variable increases, the other also increases, in general 
or on the average, there will be said to be positive correlation between 
them (Fig. 12.2a). On- the other hand, as one variable increases, 
the other may decrease on th- average. We then say that there is 
negative correlation between them (Fig. 12.2b). There may still be 


334 FUNDAMENTALS OF STATISTIOS 

_ a third situation where as one variable increases, the other remains 
constant on the average. This is the case of zero or no correlation and 
the two variables are then said to be uncorrelated (Fig. 12.2c). 


x 
Fig. 12.2a Fig. 12.2b Fig. 12.2c 
Positive correlation. Negative correlation. Zero correlation. 


12.4 Correlation coefficient 


In the scatter diagram, let us take two axes of co-ordinates for 
x’=x—x and y'=y—) (see Fig. 12.3). The origin of the new axes 
must be the point (7, J), in terms of the original co-ordinates. The 


x 


Fig. 12.3 A scatter diagram with the four quadrants 
of the (x’, »’)-plane, 


points of the scatter diagram may now be seen as distributed over the 
four quadrants (I-IV) of the (x’, »’)-plane. Further, in quadrant I 


BIVARIATE FREQUENCY DISTRIBUTIONS 335 


x’ and y' are both positive, in II x’ is negative and y’ positive, in HI 
x’ and »’ are both negative, whil> in IV x’ is positive and y' negative, 
Hence the product x'y’ is positive for all points occurring in quadranis 
I and III, while it is negative for all points in quadrants II and 1V. 

In the case of positive correlation, the general tendency of the 
points is to lie in quadrants I and III, so that in the sum Bid; the 


Positive products outweigh the negative ones ; and the sum thus 
becomes a positive quantity. In the case of negative correlation, the 
trend of the points is through quadrants II and IV, in the sum D7 


the negative values now outweigh the positive ones, and the sum thus 
becomes a negative quantity. Lastly, when there is no correlation, 
the points are equally distributed over the four quadrants, and the 
sum Jx; y; becomes zero, as the set of positive products and that of 


s 
negative products just balance each other. 
Consequently, a natural measure of correlation will seem to be 
the sum 


24 S'i = D(a), < (12.1) 


i i 

But this sum is also dependent on some factors that have nothing 
to do with the correlation between the variables. For one thing, it 
depends on the number of pairs of values of x and J which are taken 
into account. Secondly, it also depends on the units in which the 
variables x and y are measured and also on their variability. The 
first defect can be removed by dividing the sum by the number of 
pairs, n. In order to eliminate the other defects, we divide the sum 
by the product of the standard deviations, s, and 5, (both assumed 
to be >0), The resulting measure of correlation is then 


2X (%j—¥) (7-9) /0 


r=i 


s e (12.2) 


T 

r is called the (product-moment) correlation coefficient of the variables. 

If one wants to specify the variables, one may use the symbol r,,, as 
we shall have to do in a later chapter. 

The quantity in the numerator of r is called the covariance of x and 

J cov(x, y), in analogy with the term variance that is used in the case 

of a single variable. Since the standard deviations, s, and s,, are 

the positive square-roots of the variances of x and J, Say var(x) and 


336 FUNDAMENTALS OF STATISTIOS 


var( y), one may also write 


ee SOV. F) ss mea a 
ers s 


Again, 
ncov(x, y) A BRAS ES = Dai NMJ 


=EH—(ZH)(Z) In 
n var (= Dla 8) as 8 
=LaF—(Lxi)*/0, 
i i 


and, similarly, 


nvar( 9)= 3% y — (3 2) 
Hence r may be expressed in the alternative forms : 
izun- 
T = — 
1 w liz l 2s 1/2 
aaa B EA 


ngr- (Z) (Z y) 
O (Pai)? Psi- Fary" > 


(12.4) 


The last form will be found to be the most convenient for 
computing r from raw data, 


12.5 Properties of the correlation coefficient 

(a) Obviously, the correlation coefficient of x and y is a pure 
number, that is, is independent of the units of measurement of x 
and y. i 

(b) Let u=(x—A)/c and v=(y—B)/d, where A, B, ¿c and d are 
four arbitrarily chosen constants, Corresponding to each pair of 
values (x, Ji), we have a pair of values of the new variables, say k 
(ui, vi), where 

w= (xj—A)/e 


and v=( y:— B)d. 


BIVARIATE FREQUENCY DISTRIBUTIONS 337 


Since x;=A-+-cu; and 3=A+ci, 


xj—#=c(uj;—i). 
Similarly, 


Jı—J=d(v— i). 
Hence 


cov(x, y) =cd X 2S (uA) (a8; =cdcov(u, v), 
i 
var(x) =o? x $ (uj—a)* =c* var(u), 
i 


and var(y)=d* x15 (o;—0)*=d" var(v)- 
i 


Thus 
cdcov(u,v) 


Ve var(u) Vd? var(v) 


Tuy 


ress vee (128) a 


cd cov(u,v) ed 
lel [al Vvar (u)Vvar (x) lel lal 


If c and d have the same sign, —“ 


d 
lel ld] 
and r,, ate equal, both in magnitude and in sign. On the other 


is +1. Thus in this case r,, 


hand, if ¢ and d are of opposite signs, nit is —1, andherer,, and 
Tu» Will have the same magnitude but will be of opposite signs. 
(c) In formula (12.2), let us put 


x; =(xi—%)/55 


and 
K=,- I) 
Then 2 
e 
laat $ 
and rm hiid wild 
Since 
L(t) 20, 
ni 
or x HIH y >0, 
n; ni : ni ass t 


¥a(1)—22 


338 FUNDAMENTALS OF STATISTICS 


or 2(1+r) >0, 


r>—l. aila) 


sZ) > 0, 


or netia griy > 0, 
or 2(1—r) > 0. 
Hence 
r<l. ni) (928) 


Thus the correlation coefficient must necessarily lie between 
—land +1. 
r takes the lowest value (— 1) when, for each i, 


n= 

or vi ==; (4—4), 

and it takes the highest value (+1) when, for each i, 
J= 

or WAIT (3). 


In these cases the variables are thus seen to be perfectly correlated ; 
that is, each variable is an exact linear function of the other. The 
slope of the line is positive when r=+] and negative when r=— 1. 

Example 12.1 Let us consider the data of Table 12.1. The 
correlation coefficient between marks in, the college test and marks 
in the university examination is computed below. 

We may denote by x the marks obtained in the college test and 
by y the marks in the university examination, For convenience in 
computations, we make a change of base for each of the variables 
and take 

x—100 and »y—300 


as u and v, respectively. The necessary calculations are indicated 
in the following table. 


BIVARIATE FREQUENCY DISTRIBUTIONS 339 


TABLE 12.5 


DETERMINATION OF CORRELATION CosFFIOIBNT BETWEEN 
Margs IN Couuecs Testr anp MARKS IN University 
Examination (Data or Taste 12.1) 


20 68 400 4,624 1,360 
75 58 5,625 3,36¢ 4,350 
26 —38 676 1,444 —988 
87 76 7,569 5,776 6,612 
23 26 529 676 598 
21 41 441 1,681 861 
75 103 5,625 10,609 7,725 
33 26 1,089 676 858 
44 46 1,936 2,116 2,024 
9 —45 81 2,025 —495 
65 62 4,225 3,844 4,030 
14 6l 196 3,721 854 
64 82 4,096 6,724 5,248 
25 19 625 361 475 
993 1,004 | 63,061 | 92,376 | 67,313 


The correlation coefficient between x and y is, by virtue of 


property (b), 


340 FUNDAMENTALS OF STATISTICS 


nZuvi—(Lu) (Zo) 
Teg =Tao {n up ( Zu F ape oF 
x 20 x67,313—993 x 1,004 
© {20x63,061— (993)"} "20x 92,376 — (1,004)"}™ 
2 349,288 
(275,171)"(839,504) "2 


= _ 349,288 
524 6 x 916-2 


=0-727. 


12.6 Calculation of correlation coefficient from grouped data 

Suppose the values of x and y are given in the form of a bivariate 
frequency table with & class:s for x and l classes for y. Let us denote 
by x; the mid-point of the ith class of x and by y; the mid-point of 
the jth class ofy. These two classes define the (i, j)th cell of the 
bivariate frequency table. Let f;; denote the frequency in this cell. 
Then the correlation coefficient is obtained from the formula 


lia) Sü 
“SEH Da ST 


where So=Z fin Sui=2 fa 
and n = 2 fi 


(12.9) 


Obviously, 
= Zi fiolm I=ZIifoiln- 


To reduce computational labour, we may make changes of 
base and scale for both x and y. It will be found advantageous 
(when the classes for each variable are equally wide) to take as bases 
two class-marks, say A and B, somewhere in the middle of the ranges 
of x and y, respectively, and as units the widths of the corresponding 
class-intervals, say ¢ and d. 

The new variables are then u=(x—A)/c and v=(y—B)/d. 


BIVARIATE FREQUENCY DISTRIBUTIONS 341 
From property (b) stated in Section 12.5, it follows that 
_ gra ak 
P O fa™ 
Zerisu—a8o 
mex fo~ {de oj — ny 


nbn vifa — (Etu fio) Evi foi) 


Box, a= (Sua) ī 3 vifa l oT aie | ORRU 


since a= Fu fion and = Fo; fai|n 


It is easy to calculate Su; fios Zu? fi, Leyfoj Lof foj from the 
i i i i 
bivariate frequency table, The calculation of Su; vj fy is performed in 
ii 
two stages. At first one may calculate, for different fixed values of j, 
Zu; fij=U; and then obtain the sum Zo; U;, which gives 
Zo; Lai fy = Lutfi 
r ij 
Alternatively, one may calculate, for different fixed values ofi, 
Zv; f= V, and at the next stage obtain Zu; V;, which is also equal to 
i i 
gevi Jip The relation 


Zu F= Zo; U; si 2 
serves as a useful check on the calculations, Two other checks are 

EM= Zo fy= Zo; Sum Fiho vee (12.13) 
and ZU;= Fui fio (12.14) 


Example 12.2 We shall compute the correlation coefficient for 
the data of Table 12.2 by the above method, Let us denote the 
income of family by x and the percentage of income spent on food 
by y. é 


342 


S-SLEI | S-SZII | 6-926 


(Zl IIV J, 40 NOILAGIHISIG KONNAbIUY ALVIUVAIJ aoa) aoog No TINAAS GWOON] ATKV A 
40 SOVINHONIJ ANV AWOON] ATINVJ NIAALIA LNAIOITAAON NOLLVISHEODH 420 NOILVNIRUILA 
PZ ATAVL 


BIVARIATE FREQUENCY DISTRIBUTIONS 343 


Since the x-classes have width 150 each and the y-classes have 
width 2 each, we take, as our new variables, 
__x—825:5 _ y— 39'S 
snm MARe eget 


825-5 and 39-5 being the arbitrarily chosen origins, The necessary 
calculations are shown in Table 12.4. On the basis of this table, 
we have 
nu; V:— (ZU) (ZY) 
r= weg ed AEA 
{Zu fio—( ZU) n Zo Soi— (20 
: J 
E Ea EE x (—103) 
{200 x 725 — (—5)*}"/2(200 x 1,229 —(—103)2}"* 
156,515 pa 156,515 
(144,975)"?(235,191)™2 380:8 x 485-0 
=—0:847, 
12.7 Regression lines 
Let us now consider the problem of predicting, for an individual, 
the value of one variable (say y) from the given value of another 
variable (say x). To solve this problem it would be necessary to 
express the relationship between y and x in a mathematical form, 
Suppose in a particular case the approximate relation may be 
represented by a line : 


Y=atbx, ves (12,15) 
Y denoting the predicted value of y. To get an appropriate line, 
it is necessary to determine a and b from the observed data. Let us 
assume that there are n given pairs of values of x and y, the ith pair 
being denoted by (x; J:)- ‘The above line gives as an estimate of y; 
the value ~- 

Y;=a4+ bx;. 
The difference y;— Y; is thus the error of estimate for the ith pair, 
ro Pb esi „n, Since the line is to be used for estimating purposes, 
is reasonable to require that a and 6 should be such that these errors 
of estimate are as small as possible. However, it will not be enough 
to minimise the sum of these errors, because the errors, which may 
be positive or negative, may even add up to zero for a line for which 


M4 FUNDAMENTALS OF STATISTICS 


the individual errors are of high magnitude. In most cases a satis- 
factory method of determining a and 4 would be the method of least 
squares, which consists in minimising the sum of squares of the 
errors of estimation. Thus the problem is to choose a and b in such 
a way as to minimise 


S=Z(n-Y,)* 
=Z(ji—a—bx,)*. 


The desired values are obtained by solving the simultaneous 
equations, called the normal equations, 


2(y1-a—bx;) =0 
and Zt (y-a—bx))=0, } 
ie. Zy=na+b PEA 

i 
and Zun=a F ntb ent 
"The roots of the equations are 


me) 1 i— (z %;) (2) 
> xf— 2 x,)* 
iFa 
P aiai 


(12.16) 


aže) 
: 2 (*i—3)* 


n 
’ = EOV (x, y) 

é $ “Faery hs sss (12.17) 
a=j— bz. (12.18) 


_ Substituting these values in (12.15), we have the desi ic- 
tion formula : eta 


Paytr 22). - atk (12:19) 


BIVARIATR FREQUBNOY DISTRIBUTIONS 345 

The line (12.19) is called the regression line of y on x. anyone 

is the y-intercept of the line and b=r 2 is its slope. The iati b 
£ = 


is the amount by which Y increases for a unit increment in the value 
of x. It is called the regression coefficient af y on x. 

Similarly, if we are interested in predicting x from y, we use the 
regression line of x on y, which has the equation 


X=z+r (3—3). we (12.20) 
Sy 


ris, which is the amount by which X increases for a unit increment 
, . 
in y, is the regression coefficient of x on y. 
It may be noted that both the regression lines pass through the 
point (z, 5), which is, in consequence, their point of intersection. 


12.8 Some important results relating to regression lines 

Consider any one of the regression lines, say that of y on x. 
It has the following properties : 

(a) Let u=(x—A)/e and v=(y—B)/d. 

Then, the regression coefficient of y on x, denoted by b,, for the — 
sake of definiteness, is s’ 


cov(x, y)_cdcov(u, v) 10V v)_d 


var(x) = ayar ar(u) c  var(u) oe 


pa jade eae we (12.21) 


The other constants in the regression equation are, in terms of 
u and v, ir 


“y=B+d0 
and R= A+ci. ; È 


bp= A P 


(b) Since Y= +r (x3), 


A pewter 2ga). 


- 


346 FUNDAMENTALS OF STATISTICS 
Dividing both sides by n and remembering that F(si-#) =0; we have 


f=. ie (12.22) 
In words, the mean of the observed values of y is equal to the 
mean of the corresponding predicted values. 
From this it follows that the mean of the errors of estimates, 
orn Y t is zero. 
(c) From (b), we have 


¥;— P=r $2 (x3). 
Hence var(¥Y) =! X(Y,-Pf)? 
i 


rae | 
aI a paca 


=z xs, 
P. 2 ; 
=r*s,?, ort 12.23) 
Thus Ir|=2, w+ (12 24) 
BY 
which may be interpreted as the 


a Proportion of the total variability of 
J which is accounted for by its linear regression on x. 4 


Pata) Again, the residual variance is 3 
var(e)=15 e 
i 
i 
are ae Y,* 
l 
= PCD a) 
n iif i j 
nF IPB xB 8)( 9) 
2 
trixi Zia 
g$ boa =a X15, Sy +r? 2x f 
Hence ` var(e)=s,?(1 —r?). 


(12.25) 


M a O  -« 


BIVARIATE FREQUENOY DISTRIBUTIONS 347 


The standard deviation of e, which is called the standard error âf 
estimate of y from its linear regression.on x, is denoted by s,,. We 
have, then, 


Merne ne ws (12.26) 
Since var(e) > 0, we have, from (12.25), 
rl x - 
or =Leorets 


a result which has already been proved in a different way. 

Ifr=-£1, then var(e)=0. That is, in this case; = Y; foreach i, 
so that all points in the scatter diagram lie on the regression line. 
Here the linear regression equation will be the ideal predicting 
formula for y when x is given. 

On the other hand, if r==0, then var(e)=s,°- * This means that 
the errors’ of estimation are as much variable as the original values 
of y, and hence the linear regression equation is of no help in 
predicting the value of y when the corresponding value of x is given. 
(This is also seen from the fact that for the case when r-=0 

Y=}, 
so that, so far as the linear regression equation is concerned, the 
value of x throws no light whatever on the value of y.) 

Similar results will, of course, hold for the regression of x on y. 

From this discussion, it would be obvious that the numerical 
value of r also serves as a measure of the worth of the linear 
regression equation of onè variable on the other as a predicting 
formula, The higher the numerical value of r, the more efficient is 
the regression equation. 

(e) We have 


cov(x, j=, Zea 
; 1 
z$ xiex = amt 
because of the normal equations (12.16). Hence Eu 
Ta¿=0, AF 412.27) 


and e may be looked upon as the part of y which is uncorrelated 
with x, 


‘ 


Bes 


348 FUNDAMENTALS OF STATISTIOS 
(f) Since Y=a+bx, we have 
cov(Y, ) = 250+ Se; 
ni ni 
eek 
_ Owing to the normal equations. Also, 


y=Y+e, 
so that 
cov( y, Y)=var(¥)+cov(¥, e)=var( Y). 
Hence ` 
cov( y, Y) 
Vvar( y)Vvar(¥) 


=, /var(Y) _ 
war (yells ts (12.28) 


G i 


which means that the correlation between » and its ‘predicted’ value 


Y must be non-negative and must be numerically the same as the 
correlation between y and x. 


(g) We have seen that 


hsm 
and bym. 
ag 
Hence 
b5,Xb,,=72 
or Ir|=Vb,.x6,,. ve (12.29) 


Thus numerically the correlation coefficient is the geometric 
mean of the two regression coefficients, As regards the sign of r, it 
is the same as the common sign of the two regression coefficients, 

Example 12.3 For the data of Table 12.1 
(marks in the university examinatidn) on x 
test) is of considerable Practical importance, 


» the regression of y 
(marks in the college 


Here #1004 73 149-65 


and J=3004 Lt 350-2, 


‘ 


BIVARIATE FREQUENOY DISTRIBUTIONS 349 


The regression coefficient of y on x is 


Zun- (Zu) (Zo) 


Bye, oh ae cle S 

ame Eee 
319,288), 
=e ria 28 


Hence the linear regression of y on x is represented by the 

equation 
Y =350:2 + 1-269(x— 149-65) 
or Y =160-294+1:269x. 

This equation may be used to predict how much a new student, 
under similar conditions, is likely to score in the university examina- 
tion, from a knowledge of the marks he obtains in the college test. 
Since the correlation coefficient is quite high (0-727), the prediction 
is expected to be fairly precise. 

The regression line of x on y is 

X=149:65+0-416( y—350-2) 
=3-967+0-416y. 
This regression equation is, however, mainly of theoretical interest. 
(This could be used to answer questions of the form: how much 
does a student getting 360 marks in the university examination score 
in the college test, on the average ?) . a 

The two regression lines are shown on the scatter diagram in 
Fig. 12.1. 

Example 12.4 The bivarate frequency distribution given in Table 
12.2 may now be considered. Here the variables are family income 
(x) and percentage of family income spent on food ( y). Here, again, 
the regression of y on x is important from the practical point of view. 
On the basis of the calculations done in connection with Example 12 2, 
we have a 


825°5-+ 150 7o 825:5- 150 x=) —821-75 
hi 8 A ai T i 


ad Ja 9542 ZM 9542x3103) _ 99-47 
p= 25538 T g ths 


350 FUNDAMENTALS OF STATISTICS 


The regression coefficient of y on x is 
SD ujVi—(EU;) (SV; 

pheasant 
150 ngu? fio— (Uj) 

i i 

2 .(—1 6,515) o. 
at Aaa Per ag 

Hence the regression equation of y on x is 

Y =38-47 —0-0144(x—821-75) 
or Y =50-30 --0 O1 4x. 

This, again, is expected to serve as a good prediction formula 


for y, given x, since the correlation coefficient between x and y is 
numerically quite high. 


2 


bya 155% 


b= 


12.9 Theoretical distribution of two variables 

As in the univariate case, here, too, we look for a simple 
mathematical function of x and J to represent the joint distribution 
of x and y in the population. If x and J are both continuous variables, 
then this distribution is defined by a Probability-density function, 
S(x,y), which is such that 


db 
J [Tenio 


gives the compound probability that x lies in the interval 


4 (a, b) and 
yin the interval (c, d), whatever the intervals may be, 


n Assuming, 
with no less of generality, that the range of x as well as that of yis 
. from — œ to oo, then 
» 
J 1er) 
Ka 
o 
and J Sl )ds=H(y) 
define the marginal distributions of x and y, respectively, 
Again, 
PIEN 3) ip 
g) =f(9]*) 


defines the conditional distribution or array distribution of y fo- 


—=~- ee 


BIVARIATE FREQUENCY DISTRIBUTIONS 351 


given x such that g(x)>0.* Similarly, 
fera 
i) fela) 


defines the conditional distribution of x for given y such that A( y) >0. 

If the conditional distributions of one variable for given values of 

the other are all jdentical or, equivalently if 

Se 9) =8(#)h() 
for all x andy, then the two variables are said to be independent. 
Otherwise, they are said to be associated. 

The means of these conditional distributions, say y, and £,, are 
of special importance. The term ‘regression’ actually refers to the 
relationship between 9, and x or between ¢, andy. The equation 
that expresses the conditional mean 7, as a function of x is the 
regression equation of yon x. If, is of the form 

nz =0 + Bx, «++ (12.30a) 
then the regression is said to be linear. In fitting a regression 
line to observed data, we thus attempt to estimate the true regression, 
assuming that it is linear, (12.30a) may be expressed in terms of the 
means (p, and p,), standard deviations (o, and a,) and correlation 
coefficient (p) as 


n= by Pete He): ... (12,30b) 


Obviously, poyo, is the regression coefficient of y on x> = 
The regression of x on y similarly refers to the relationship) 
between £, and y. In case this regression is linear, - 


é spate, (9-2): wee (12.81) 


Jt will be sufficient for our purpose to consider one theoretical 
distribution of the bivariate type, viz. the bivariate normal distribution. 

If x and y are two variables, then they are said to be distributed 
in the bivariate normal form if their probability-density function is 


l 1 j(@—#s)* 
Wem) Sie al E 
o (*#—He)(9—Hy) (Ie) 
EE T. 
zo Llo, 0 cy. «. (12.32) 


*If g(x)=0, then the conditional distribution is not defined (cf. Section 3.5). 


352 FUNDAMENTALS OF STATISTICS 


The main properties of this distribution are stated below : À 
(a) Ifxand y are jointly normally distributed, then the marginal 
distribution of each variable is of the (univariate) normal form : 


a= g-e] we (12.33) 
l _expf—(9=#3)"), we (12.34 
À eaae] (12.34) 


(b) The conditional distributions of each variable for given 
values of the other are also of the (univariate) normal form : 


FON = ec seta Po (spa If}: 


(12.35) 

1 1 Cy that: 

i atresia ae oor) AEA 4} | 
(12,36) 


(c) Itis seen from (b) that the means of the conditional distri- 
butions of y for given values of x are given by 


a= Hy +P (sua) . (12.37) 
and those of the conditional distributions of x for given values of y by 
E,=Hete“*(y—py). s+ (12 38) 
J 

Thus the regression of J on x and that of x on y are both linear 

in case x and y are distributed in the bivariate normal form, 
(d) From (b), it is also seen that the standard deviation of the 

conditional distribution of y for any given value of x is 

o, s=0, VIZ. we (12:39) 
These conditional distributions are, therefore, homoscedastic (i.e. have 


equal dispersion). Similarly, the conditional distributions of x for 
given values of y are homoscedastic, each having the standard 
deviation 

Ox.y=0,V 1—p*, 


(e). In the context of a bivariate normal distribution, 


the corre- 
lation coefficient p has a precise meaning. 


Se 


BIVARIATE FREQUENOY DISTRIBUTIONS 353 


Thus if p=0, then JES epf- a (Soa + ooh 


Qnoxcy | ne vy 
=g(x)h(y). This means that if x and y are uncorrelated, then they 
are also independent. 

Again, if p=-t1, then (considering the conditional distributions 
of y for fixed values of x) o, V 1—p?=0. So here the values of y in 
each array are exactly equal to the array mean ; and since the 
array mean is a linear function of x, p= +1 implies that there is an 
exact functional relationship (of the linear type) between x and y. 

Furthermore, since ¢,V1—p* decreases as p increases numeri- 
cally, it may be said that the higher the numerical value of p, the 
nearer are x and y to linear functional relationship. 


1210 Limitations of the correlation coefficient 
Because of the close connection between correlation coefficient 
and linear regression, it is clear that the former can serve as a 


Å ; 


Fig. 12.4 Scatter diagram, showing fairly close non-linear 
relationship between x and y. 


satisfactory measure of the relationship between two variables only 
when that relationship is of the linear type. Hence a low value of 
the correlation coefficient does not rule out the possibility that the 
variables are related in some other manner. If x and y have a non- 
linear relationship like that in Fig. 12.4, the least-square regression 
lines will be approximately parallel to the axes of co-ordinates and 


¥a(1)—23 


354 FUNDAMENTALS OF STATISTICS 


hence r will be very small, although actually the relationship 
between the variables may be quite strong. 

(This is how one may explain the paradox thatthe correlation 
coefficient between the amount of fertiliser added per plot and the 
yield of crop per plot is small.) 

Consider, ¢.g., the following data, which, although artificial, 
show that the correlation coefficient may be zero even when there 
is perfect (functional) dependence of one variable on another and 
may, therefore, totally fail to measure the relationship between the 
two variables. 


-[ E S E »—5 
10—x for x=0, |, ...... wos 
and hence y is an exact function of x. 
But 
#=0 
and also 
Zx Jı=0. a 
Hence 
cov(x, y=! Zx J —37=0, 
implying that the correlation coefficient also is zero. a 


It is advisable, therefore; that one should see whether the 
general relationship between x and J is linear or not (by drawing the 
scatter diagram, for instance) before using r as a measure of this 
relationship, 

Again, the fact that two variables are correlated does not 
necessarily mean that they are causally related—that one variable is 
the cause of the other. Indeed, the two variables may appear to be 


correlated even when both are caused’ by some othe: variable or 
variables (vide Section 13.9), 


BIVARIATE FREQUENCY DISTBIBUTIONS 355 


There will be very high correlation between age of husband and 

age of wife, although none may be said to be caused by the other. 

` The high correlation is really an effect of the prevailing social norms ` 
that lead older husbands to have older wives (and vice versa) and 
younger husbands to have younger wives (and vice versa). 

A spurious correlation may also arise from the non-homogencity 
of the data. Thus suppose a number of groups of individuals, each by 
itself homogeneous but having varying means, are mixed up. Then 
x and y may appear to be highly correlated in the combined group, i. 
even if in each of the constituent groups the variables have little or 
no correlation (see Exercise 12.14 in this connection). 


12.11 Correlation index and correlation ratio 

When the general relationship between x and y is non-linear, 
the correlation coefficient fails to measure the extent of their 
interdependence. This happens because of the close link between 
correlation coefficient and linear regression, as will be apparent from 
Section 12.8. ‘ 

In case the regression of any one variable (say y) on another 
(say x) is non-linear, we may still like to devise a measure of the 
dependence of the first on the second. 

We have seen in Section 12.8 that 

pyar) 
var (y) 


3 ZI) 
ne ZOD” 


where Y, is the predicted value of y, from the linear regression 
equation, corresponding to x=%;: 

In other words, 7? may be interpreted as the proportion of the 
total variability of y which is accounted for by its linear regression 
on x. (It has been called the simple coefficient of determination.) 

This concept can be generalised. Suppose the appropriate 
regression equation is a polynomial of the pth degree (p<n—}). 
Then we can define a measure of association, similar to |r|, called 


356 FUNDAMENTALS OF STATISTIOS 


the correlation index of the pth order, say rp, by 
z 2_var(Y,) 

var(y) 

Zn- 

: 29-9)" i 
where Y,, is the predicted value of y from the pth degree polynomial 
regression equation corresponding to x=2,. We have 

Y =at axta... +ayx?, 


where the constants ap, a1, dy, ...... »@, are to be determined by the 
the least-square method. This means that }( Ji Ypi) is to be 
‘ g 


(12.40) 


minimised with respect to ap a; ...... ap. The normal equations 
yielding the desired values of 4%, a, etc., are then 


Pymndot adert arte as +a, 5x’, 
i 
Jan = uzn ta tHE ne: +a,>x,°*', 
i n i 


Bal aoda! + Ia HHan.. +p Zx?’ 
i 
Now, r}<1 since 
RI = BAY) BK u3), 
the sum of product terms vanishing 
by virtue of the above normal 
equations determining Y,; 
=P TWHE, (12.41) 
(since 5=Y,). 
We shall show that as the degree of the polynomial increases, the 
value of the correlation index also increases. 


Let the polynomial regression equations of degree p and degree 
p—! be, respectively, 
Y= azta h. +ayx? 


Y=aytayxtagx*+..... ay ix?-1, 
the constants in each case being determined by the least- 
method, 


and 


square 


BIVARIATE FREQUENOY DISTRIBUTIONS 357 


By definition of the least-square method, then, 


D Piaam Bites ve ape? )PP < Zi bo bate— eee —bpx;”)*s 
whatever the alternative set of constants b; may be. Taking, in 
particular, bg= dhs 6)=a@4, +--+ , bp-1=4p=35 b,=0, we have thus 

Li- totti- — apti")? 
; <Fln—ao— t PAET O aed E 
i.e. Z= YyJt< Zn Y p-1,1) 


This implies, by virtue of az 41), that 
2 L(Y ,i—¥,)* > Xp —P oa) 


Hence for any py 


r? > r? 

p AT p-1" 

Taking p=2, 3,...... , n—1, successively, we thus have 
PSLT Se grij 


We shall now introduce a more general measure of the degree of 
dependence of one variable on another. To this end, suppose the n 
pairs of values of x and y are arranged in arrays of y according to 
fixed values of x : 


Xi | Jir Pray een Jin, 
en. ser Stas | Seema, 
x Jay Saas veneer s Jin 
i | $ 3 t 
Xk Jkr: Jia ee »Jkng 


roie 


This means that for n; of the n individuals the values of x are the 
same, viz. xp while the values of y are Jins Jim ej Pin; 
Suppose further that 
J= Fylt EA) 


the mean of the ith array, Ase 


J= TEI B= Zum ve (12.48) 


the grand mean of y. Also, the mean of x is 
i= zm xj[n- we -(12,44)— 


958 FUNDAMENTALS OF STATISTIOS 


If the regression of y on x were linear, a measure of the 
dependence of y on x (or of the interdependence of x and J) would 
be |r|, given by 

2ni(*;—2)(F4—3F) 


me ee (12.45) 
V Znie) V EE 

and, in analogy with (12.28), we would have 
Dai( ¥;—5)* e+ (12.45a) 


PrE Zo 
where Y, is the value of y corresponding to x; as given by the 
regression line of y on x, 

The values Y;=a +x; are obtained by minimising 


2X (sya —bx;)* 
or, equivalently, by minimising 


Zn(Ji—a—bx;)? + (12.46) 
with respect to a and 4, the normal equations being in each case 
Zn Jı=na+ bzn Xis 
{ BH HAVE NX, + msh s. (12.47) 


Now, it would be apparent from (12.46) that Y; is just an 
approximation to j; The expression (12.45) also shows that |r] 
really purports to be a measure of the extent to which the array 
mean of y for given x depends on x. In case the regression is linear, 
i.e. in case Y;=5, for each i, |r| achieves this purpose, 

But, as we have already stated, in our case t 
linear and an alternative measure is called for. The discussion in the 
preceding paragraph Suggests such a modification of |r|, through the 


The new measure obtained, 
d denoted by ê; x, is thus the 


he regression is not 


— §\2 
> RMI 3) 


ee 22 Ou" "4. (12.48) 


— w 


oe a —-_————— 


BIVARIATE FREQUENOY DISTRIBUTIONS 359 


In case the data are grouped into a kx! bivarate frequency table, 
the k classes of x may be supposed to give k arrays of y values. For 
the ith array, the total frequency is fio and the array mean is 

J= F fail fio : 
while the grand'mean is 
J=} fos Hil 
The correlation ratio will then be given by the formula 
E fol Ji- ; 
oe pat ji vee, (12.48a) 
Z f2) 


Note that 
FEl- = ZZ J) +( J-F 
= Fn HAD + EL u 
and, on dividing both sides by n, we may write this as f 
aed, t+ (led) 5% wae (12.49) 
Also, 
Xml IJ) = Tad ( ¥,—9)+(5:—Yi)¥ 
=Zmi( Y3) +n, = Y)” 


the sum of the product-terms vanishing because of the normal 
equations yielding Y;. Again, dividing both sides by n, we may 
write this as 


ef, tart +(e.) s (12.50) 
Since i 
, 1e} == LL Ju) 0 
ni j e 
and garmi gal T> 0, 
we immediately have 
' PLS. we (12.51) 


If follows from (12.50) that e3,=r° if, and only if, ¥;=j, for 
each i. Hence the difference ef, —1* measures the extent to which the 
d true regression of y on x (i.e. the true relationship between the array 


360 FUNDAMENTALS OF STATISTIOS é 


mean of y for given x and x) departs from linearity, This is another, 
and the more common, use to which the correlation ratio is put. 

Actually, the correlation ratio serves as the least upper bound to 
the numerical value of the correlation coefficient or a correlation 
index, ; : 

Just as r? may be interpreted as the proportion of the total 
variation of y that is explained by its linear regression on x, e3, pay 
be interpreted, because of (12.49), as the proportion that is explained 
by the array means of » for given x values. But whereas the corre- 
lation coefficient is symmetrical in x and J, the correlation ratio is 
not, so that generally e,, and £sy Will not be equal. 


Questions and exercises 


12.1 Distinguish between the correlation approach and the 
regression approach to the analysis of bivariate data. 

12.2 Define correlation coefficient. S 
tant properties. What are its limitations ? 

123 Define the term regression. 
results relating to regression lines, 

12.4 Explain what is meant by a theoretical distribution in two 
variables. Define marginal distributions and array distributions. 

12.5 Define the bivariate normal distribution and state its 
important properties. 

12.6 Using Cauchy-Schwarz inequality, prove that r lies between 
—l and +1. Hence interpret the cases r— +1. 

12.7 Show that the angles between the 


tate and prove its impor- 


State and prove all important 


two regression lines (of y 


on x and of x on y) are tan-1(4.1—1* af £ ) and interpret the 
r si ts? 
cases where r=0 and r= +1, *? 
12.8 What is correlation 
discuss the following cases : 
Br, Fateh, =r, B ot, t=, = f: 
12.9(a) Show that formula (12.48) for the correlation ratio may 
be reduced to the form 


ratio? Show that O<r<e, <1 and 


RCR yy) In (ZE yu)"]n 
= n ł—n —ny") = $5 =. = 
Gin PEE ay) PEW (EE yu)" 


7 


BIVARIATE FREQUSNOY DISTRIBUTIONS 361 
Ga 

(b) Show that, for a bivariate frequency table, the correlation 

ratio may be computed by the formula 
e _ EVE Sio—(ZPiNIn 
Ho S Sa 

Hence determine the correlation ratio of y on for Table 12.2 
and comment. Partial ans. ¢y,=0°858. 

12.10 Show that the correlation ratio e,, is the simple correlation 
coefficient between y and the array mean of y corresponding to x. - 

12.11 Let x and y be independent variables with standard 
deviations ¢, andc,. Show that the correlation coefficient between 
x and x+ J is ’ 

os| Vo. F0, 

12.12(a) Let x and y be jointly normally distributed with equal 
means and equal variances and with a positive correlation coefficient. 
Show that the conditional mean of y for a given x > p will be less 
than x and vice versa (where p is the common mean of x and y). 

(b) Hence account for the paradox that for a tall father, on 
the average, the son (adult) though tall is shorter than the father 
himself, while for a tall son (adult), on the average, the father 
though tall is shorter than the son himself. (The feature has been 
termed ‘regression to mediocrity’ and is at the origin of the use of 
the term ‘regression’ in statistical literature.) 

1213 What would be the normal equations if a polynomial 
regression Y=ag-+@%++++++ +-a,x’ were fitted to the given values of 
x and y? Suggest a measure of the usefulness of this regression 
equation as a predicting formula. 

12.14 Let there be & groups of data on x and y, with means x; 
and jp variances 5%; and sł; and correlations r; (i=1, 2, -e pee 
Show that the correlation for the combined data is 

Furniss tEn (21—23) (3-3) 
= 


mi Se S Lams t mii 


t 
where x and § are the grand means of x and y and n; is the number 


of pairs in the ith group- 
Hence explain the phenomenon that y may be zero for cach i 
and yet r may be non-zero. 


ye 


362 FUNDAMENTALS OF STATISTIOS 


ad 

12.15 Let x and y be subject to observational errors, so that what 
one -observes really are x’=x+e, and »’=y+e, instead of x and y. 
Ife, and e, are independent of x and y, and if e, and e, are also 
mutually independent, show that the correlation coefficient between 
x and y becomes numerically smaller owing to the errors. (This is 
called the ‘attenuation effect’.) 

12.16 A gunner is aiming at the centre of a rectangular target, 
40 ft, wide and’60 ft. high, at a given distance, Suppose the actual 
point of hit is (x, y) with origin at the centre, such that x and y are 
independently normally distributed with zero means and standard 
deviations o,=16 ft. and o,=20 ft. Iffour shots are fired, find the 
probability that the target will be hit at least once. Ans» 0-9899. 

12.17 The following data relate to the stature (x) and sitting 


height (y), both in cm., for each of 30 people of a particular Indian 
caste : x 


——— a 


x J * 3 x J 
A reer rere A i 
1728 83:9 157:7 777 170:2. 834 
1660 83:6 1467 764 1533 799 
1641 81-3 158-2 17-2 1737 86? 
1644 854 155-3 80:1 1558 78-6 
168-8 83-9 151-5 76:9 1580 80-1 
1652 81-1 1611 815 157-2 81-6 
1700 849 l 156-3 80:9 156-2 78°6 
163:5 arl 169-4 83-1 1682 82°5 
169-4 8+9 © 1599 84-2 1644 841 
159-1 796 1617 80-3 ` 1655 871 
(a) Represent the data by means of a scatter diagram, x3 


(b) Compute the correlation coefficient of x and J 


Ans. r=0-839, 

(c) Obtain the linear regression equations of 'y on x and of x 

on y. Hence determine what the sitting height of a man is expected 
to be if his stature is 175 cm. 


BIVARIATE FREQUENOY DISTRIBUTIONS 363 


12.18 During an investigation in an agricultural farm in Bengal, 
the length (in cm.) of green jute plant and the weight (in gm.) of 
dry jute fibre were observed for 350 platits. With these data, the 
following bivarate frequency table was obtained: , s 


“Length of green plant (cm.)° 
: class mark 


ri 
f dry jute fibre (gm.) 
class mark 
sle la | 
© 
“~ 
v a 
N 
ne 
> 
& 
= 


Welght o 
ve} 
a 
vw 


From the above table, compute the coefficient of correlation of 


the two variables. Also, find the linear regression equation of weight 


of dry fibre on length of green plant. 
Ans. r=0:755 ; regression eqn. : Y =8-379 +.0-0756x. 


12.19 The difference between upper face length (y) and nasal 


length (x), both measured in mm., is given for 15 Indian adult males : 


15 13 15 
15 12 19 
13 12 ‘19 
14 15 19 
1 15 14 


364 FUNDAMENTALS OF STATISTICS 


Calculate the correlation coefficient of x and y and the linear 


regression of y on x, given that for these people y=49°34 mm., 
See ata sem 3- 53 Mm. and s, =430 mm. . 
Partial ans. r=0'820, 6,,=0 999. 


` 


12.20 For 20 army personnel, the regression of weight of kidneys 
(y) on weight of heart (x), both measured in 0z., is 


Y=0°399x+4 6-934, 
and the regression of weight of heart on weight of kidneys is 
X=1-212y—2 461. 


Find the correlation between the two variables and also their 


means. Can you find their s.d.s as well ? 
“ Partial ans. r=0 695, z=11-509, p=11-526. 


12.21 Consider the following data : 


F | Seg yael e r a Sar A A Aa ea a | 
> ol 25 34 39 41 38 35 28 08 
Find r,, and comment. Partial ans. r=0 054. 


12.22 The figures of production of crude petroleum and 
production of wheat flour in India are given below for a number of 
years : 


Year 1958 1959 1960 1961 1962 1963 1964 1965 


Production 


f crud 
Saroia 439 450 454 514 1,077 1,653 2,212 2,176 


(000 tonnes) 


Production 


f wh 
on 830 976 995 1,002 1,202 1,418 1,819 1,604 


(000 tonnes) 


Compute the correlation coefficient and comment. 


(Such purely accidental correlation between tuo time series— 
one having no causal significance—is called nonsense correlation.) 


BIVARIATE FREQUENOY DISTRIBUTIONS 365 
SUGGESTED READING 


[1] Ezekiel, M. and Fox, K.A. Methods of Correlation and Regression 
Analysis (Chs: 5—9). John Wiley, 1959. 

[2] Goulden, C. H. Methods of Statistical Analysis (Chs. 6—7). Jobn 
Wiley, 1952, and Asia Publishing House, 1959. 

[3] Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Part I 
(Ch. 15). Van Nostrand, 1954, and Affiliated East-West Press. 

[4] Moore, P. G. Principles of Statistical Techniques (Ch. 14). Cam- 
bridge University Press, 1958. 

[5] Snedecor, G. W. Statistical Methods (Chs. 6, 7, 15). Towa State 
University Press, 1967, and Oxford & IBH, 1968. 

[6] Szulc, S. Statistical Methods (Chs. 15, 16). Pergamon Press, 
Oxford, 1965. 

[7] Yule, G. U. “Why do we sometimes get nonsense correlation 
between time-series, etc. ?” Jour Roy. Stat. Soc., 89 (1926), 


pp. l-: 


1 3 MULTIVARIATE FREQUENCY 
; DISTRIBUTIONS 


13.1 Multiyariate data 
In some investigations, data may be collected, for the given set of 
individuals, on a number of (more than two) variables at the same 
time. Thus the data may relate to the scores obtained by each of a 
number of high school students in each of five subjects, say, English, 
major vernacular, mathematics, history and elementary science, In 
an agricultural experiment with a variety of wheat, the data may 
relate to the amount of a fertiliser added, the number of man-hours 
spent in tilling the land, the yield of crop and the yield of straw. 
Such data may also be arranged into a frequency distribution 
as in the univariate or the bivariate case. If there are p variables 
Xis Xpy eens Xp with kis ky, ..... , kp classes, respectively, the distribu- 
tion will be given by ky Xky X <... xk, cell frequencies, the frequency 
in each cell being the number of individuals belonging simultaneously 
to the corresponding x,-class, x,-class, ...... » x,-class, From this joint 
distribution of the p variables, one may also obtain the marginal 
distribulton of any p' of the variables (1 < p' < p—1) or the conditional 
distribution of any p' of the variables for given values of p” of the 
other variables (p’, p” >1 and p'4+-p" <p). These are defined here 
in a way similar to that in the bivariate case. 
13.2 Multiple regression 
As with bivariate data, here too, it may be that one of the $ vari- 
ables, say x}, is of primary interest to us and we consider E E EN 5 
xp together with x, in view of their possible influence on the latter. 
The object may be to build up a relationship between the ‘dependent 
variable’ (regressand), x,, and the ‘independent variables’ (regressors), 
Hay Egreso » xø, with the idea of using this relationship for predicting 
the value of the regressand from a knowledge of the values of 
the regressors. Thus, in estimating the rainfall at a place in a year, 
it is appropriate to consider the effects of the latitude, the longitude 
and the altitude of the place on rainfall. Similarly, in estimating the 
366 


MULTIVABIATH FREQUENOY DISTRIBUTIONS 367 


yield of a crop in a year, it is proper to take into account the 
effects of, say, rainfall, average temperature and average humidity, 
during the period between the sowing and the harvesting of the crop. 
And common sense suggests that the higher the number of indepen- 
dent variables, the better is the prediction likely to be. 


Let us assume that the relationship between x, and 8g, %g) «+++ T7 
is, at least in an approximate sense, given by an equation of the form 
Xie. p =a + bata tH batat aperee +b px. eee (13.1) 


Our data here will consist of p values, corresponding to the p 
variables, for each of n individuals. The values of the variables for the 
ath individual may be denoted by *y; #gas +++: s tpa (=l, 2, vere yn). 

In order to determine the constants a, bys bas «+--+ , bp on the basis 
of the data, we again make ‘use of the least-square method. 

If we denote by x;-99..p the difference +;—¥y-23...9s then the error 
of estimate corresponding to the ath individual is xy-95..p,0° The 
least-square method means that the constants a, by, bs, +--+ , bp are to 
be so determined that 


xf 10. pr0 = (tia — abst — serene —byXpa)* (13.2) 


is a minimum, is 
The normal equations in this case (obtained by equating to zero 
tne partial derivatives of (13.2) with respect to a, bys bpe , bp) are 


Etras =0, | 


Frsatrrpa =0, 


(13.3a) 
Zr aakita-pie ag 
Ferotrnone=0 
or ; 
Tere ana b Errat bsZs t o +b Ete ` 
Lraekia=—4 Säga +bsExte Hia Ettatsa t agave +b, Stet yes | 
Drae*e magst by Zrse*es +bsZrhe > ok, ra by Sastre | 
Fihi aD pa tbe Epaia +s 3p atsa Peri +b Erho 
i (13.3b) 


368 FUNDAMENTALS OF STATISTICS 
' 


The first equation gives, on being divided by n, 


=at bat bt eee tbpp Je (13.4) 
which shows incidentally that the mean point (fp o +--+ riy) 


necessarily satisfies the prediction equation. 

Multiplying (13.4) by np, 2%, =- , ng, and subtracting from the 
second, third, ...... , pth equation, respectively, of the system (13.3b), 
we have (p—1) equations determining the b’s, viz. ; 


Be by Sig E eaves +b Says 
S= b Sst bS t ae +bpSsp, \ (13.5) 
Spp=b Spot baS pat ten +b pS» », 


where 
Sig= Lriarja— MRR = Z(+ie—¥i) (xj. — 8j). sa (18'6) 


It will be assumed that the matrix 


E A 
7 Sse Sas setae Sap 
by A E A 
or, equivalently, that the matrix 
Sig Sgene Sep 
Sop 533 essere S3p $ (13.7) 
TA REI 


where 
1 cov(x;, xj) if i<j 
PERN at b Xj tj 
tet att rich .. (13.8) 
is non-singular (i.e. is of rank p). 
The whole p x p matrix 
S=(s;) 
is called the variance-covariance (or dispersion) matrix of xy, xy, ...... Ripe 
' The above assumption of non-singularity for the matrix 03.7) 
will imply that (13.5) has the unique solution 
by Sn Speen Sep ia 


b Su 
bs 3 j Sy sn oeenes Sap 


i ; g we (13.9) 
b, Spa Sppe Sop Sp 


MULTIVARIATE FREQUENOY DISTRIBUTIONS 369 


or that 
bj=sn t? +5559 + hautsa? 


Sop Seg etse- Sa Satie 1) eee Sap 
Sas Saa ee Sdn) Sa Sajen) Sao 


d Spe Spat Sprj-n Son Spss) Sop ve (13.9a) 
Sea. Seu vere eee Laney eee 
Seq Sah. 08h eth eR tes + Sap 


on, ha ge Oe 
(for j= 2, 3, e P) 
Since s,;=rij 5; $j Where i 
s;=standard deviation of xi 
ETS » » j 
and r,;=correlation coefficient of x; and xj, 
we may also write, on simplifying (13.9a), 


ra Tages Ten Tijen Tee 
fa Ta e+ Toui-1) Taj+n etap 


byen(—Lyiet ye’ | Sed oaei Tpi- Taie E E 
$j Teo Te3 «°° eee s.. tee Top 
Tgp? Seats, <2 + vTap 
fon. Teas. “4 E T 


We shall write R for the matrix 


Tey a ToseeesTep a3.) 


rpi Ter Toae Tae 


which is the correlation matrix Of Xyy Xx e „xp R for the cerres- 
ponding determinant and Ry for the co-factor of ry in R. We 
see that the determinant in the numerator of (13.10) is the minor 
of ry (in R) and hence is (—1)!*/xthe co-factor of rp while the 


¥a(t)—-2! 


370 FUNDAMENTALS OF STATISTICS 


determinant in the denominator is the minor (and also the co-factor) 
ofr,,. Hence 


= (—D) 28 xiy Pay 
;=(— 1)” x R, 


Sj 1 
Rog A 
gs èt aa fi SEZ, Dp TA see . 
Res) or j=2, 3, >P» (13 12) 
while, from (13.4), 
À T Rios 
Sayy luxia. 
ate eae re, we» (13.13) 


Thus the prediction equation (called the multiple regression equation 
Of X4 ON Xpy Xgy eree » *») becomes 


pi Ree ores ey ign 
1 Ravi Ža) Ba ate) TEN 


ELTE VAA 
Ra xy Xp). ++ (13.14) 


: R, > 
The coefficient b;= ERa x “lin (13.14) is called the partial regression 
1 


coefficient of x, on x; for fixed xg, ...... a EE A » *», and is often 
written in the more explicit form 
bij e8tj— a iaay +» (13.15) 


Evidently, it gives the amount by which the predicted value Xy-09...9 


increases when x; is increased by a unit amount, the other indepen- 
dent variables being kept fixed. s 


Example 13.1 The following table shows, for each of 18 cinchona 
plants, the yield of dry bark (in oz.), the height (in inches) and the 
girth (in inches) at a height of 6” from the ground. 

Supposing we denote these variables by x,, xa and xy, 
it is of practical interest to study the dependence of xy 
For this purpose, let us first determine the multi 
equation of x, on x, and x3. The preliminary calculat 
in Table 13.2. 


respectively, 
on x, and x, 
ple regression 
ions are shown 


MULTIVARIATE FREQUENCY DISTRIBUTIONS 371 


TABLE 13.1 


Yretp or Dry Barg, Heiaut anp GIRTH at A Lever 6" 
ABOVE GROUND For 18 Crronona PLANTS 


ne pee Pa) a 
i 4 32 13 4 
2 5 25 5 2 
3 3 10 6 3 
4 3 20 Tug 4 
5 2 27 8 4 
6 a 13 7 3 
7 4 49 i2 5 
8 6 27 6 3 
9 3 55 16 7 


This gives 
¥,=581/18=32-28 oz., 
¥,=179/18=9-94 in., 
¥,=66/18=3-67 in., 


Vazaa— (Zia)? 63,713 _ 252-41 
F Wo ies 


6,35 353 _ 79:706 706 
a=—5 16 =4°43 in., 


saV Ra ee 1-41 in., 


nZžratra— (Zire) (Zsa) 
= a 
m= agai Care) V ngat (Ze) 


15,449 ‘ae 15,449 15,449 0:768 
"eal Coe 713 6,353 252- 41 x 79 "706 =$0,1 18-59 


= 14:02 0z., 


43> 


372 FUNDAMENTALS OF STATISTICS 


620 _ 4,620 4,620 _ 9.719 
n= VeT 252-41 x 25-456 6,42535 
and 
a E ERN 


6,353 648 79706x 25-456 202900 


TABLE 13.2 


CALOULATION OF Sums, Sums OF SQUARES AND Sums OF 
Propvors FOR THE Dara or TABLE 13,1 


wi 8| 4 361| 64 16 | 152 | 76 32 
st | 15 | 5 | 2601! 225 | “25 | 765 | 255 75 
3 |u| 3 900| 121 9 | 330 | 90 33 
42 | 21] 3 | 1,764] 441 9 | 882 | 126 63 
3] 7| 2 625| 49 4] 175] 50 14 
8 | 5 1 324| 25 1 50 | 18 5 
4 | 10 | 4 | 1,936) 100 16 | 440 | 176 40 
56 | 13 | 6 | 3136| 169 | 96 | 728 | 336 78 
3 | 12] 3 | 1,444] 146 o | 456 | 114 36 
32 | 13 4 1,024 | 169 16 416 128 52 
So EIA RE E UO ET” 625) 25 4 | 125 | s0 10 
foe's 100 36 9 | 60 | 30 18 
20 | 4) S 400,16 16 | 90 | s0 16 
ea ee oa pa 79; 64 16 | 216 | 108 32 
13 | 7 3 169| 49 9| af 39 21 
49 | 12 5 | 2401' 144 | 95 | s | 245 60 
7 | 6 3 729 36 9 | 16 / 81 18 
ss | 16 | 7 | 3025! 256 | 49 | eto | 35 | 112 


i 


Total 58t | 175, | 366 0/22293 (2193 | 278 | 6,636 | 2,387 | 715 


MULTIVARIATE FREQUENCY DISTRIBUTIONS 873 


If the multiple regression equation is denoted by 
Xira =a + birata t bigots 


_Tre—Tistes S1 _ 0394 14-02 _5:52_ 71 
then divs Te Xs: 0780 % 448 FB 


d bya aina Hin S20 40249 _ 96 
and bes Toa qt 1a ole 


while a=, —by9.3¥2—5)9+9%3=—0°72 (all in proper units). 


Hence the multiple regression equation is obtained as 


Xyigg= 0-724 1-7 xp +- 4-365. 


13.3 Multiple correlation 

In studying the dependence of x, on a set of independent 
variables, we may want to know to what extent x, is influenced by 
the independent variables. In the case of two variables, x and y, 
we have seen that r,y= |r| serves as a measure of the strength of the 
interdependence of x and y or, if y may be looked upon as depen- 
dent on x, of the extent to which x influences y. Generalising this 
approach, we may take the simple correlation between x, and Xy.93...p 
i.e, the value of x, given by the multiple regression equation of x, on 


Hey veeeee , Xp, as a measure of the joint influence of xp, Xg, +++ xp on 
xı It is called the multiple correlation coefficient of x, ON Xg, Xgy «+++ va 
and is denoted by ry-93...p- Obviously, then, 

COV(X1s Xy-e5p) ve (13.16) 


Ty 23...p > VERGEN a7 76 BEE 
According to our notation, 
var(x;) =s. 


Again, the mean of the predicted value Xj.9...p is, from (13.14), 


1%. *23...p 42 =a,“ x se XE (neh) x AE (tae) 
u 


ee Ris XP Else) 


(13.17) 


=f; 


374 FUNDAMENTALS OF STATISTIOS 

Also, since 
i y= Xj -93...p tA 1-23.99 
the mean of x,.93...p is zero and 


COV(X,, Xy-93..p) =VAC(Xyreg.np) + COV(Xy-29...99 *1-99--p) 


=var(Xy-99...p)> ert 15.18) 
because 
ncov(Xy-93..99 %1 13-9) =Z +. AORE See 
=0, (13.19) 
from the normal equations (13.3a), X;.s3...» being a linear function of 
Kos Xgy seere s žps 
But 


COV(žis Xrm) = Z) (Xi-28...p, a —31) 


E E A — FB C 


ed ye $1 v8 
sesape Riis," ‘a 
Riesi Ris Se Rips: 
= X— Xs XFX Sgen -= xX xs 
Ru se a RA 25 eR Ri; Sp m 
2 
= rRat Bist erene +r pRiy) 
11 
need, Ua ee he EE LSD, ws (13.20 
RE TuRu) ( Ri ( ) 
Hence 
-(1—)\,3 
See, ( Ry 1 Ry! i 
tres ( Ra . (13.20a) 


The multiple correlation coefficient, being essentially a simple 
correlation coefficient, must lie between —1 and +1, But the 
covariance between x, and Xj.95..,, being at the same time the 
variance of Xj.59...p, has to be a non-negative quantity. Hence we 
have always 


OSS.. ee (13.21) 


MULTIVARIATE FREQUENOY DISTRIBUTIONS 375 


13.4 Some results relating to multiple regression and multi- 
ple correlation 
Two results that we have already obtained in Section 13.3 are : 
(a) Žito =F 
implying that 
Zii 20) ee (13.17) 
and 


(b) var(Xy-3..9) = -i)e ves (13.22) 
Ry 
=r fip ssa (13.22a) 
(c) We also find from the normal equations that, for 
fxn 2, By nsccesy ify 
1 
COV (xis žirsup) = 2 žia *p93...p, a 
=0, GRCI SAA) 
As such, the residual part x;:¢3..p is uncorrelated with each of the 
independent variables (and hence with the regression part Xj 95... »)- 
(d) Since x; =X4-93...p-F*i-2... 
and COV(X gg.np> X1-29:0.p) =05 
as we have seen in (13.19) or as may be seen from (13.17a) and 


(13.23), we get 

var (xq) =var(Xj.99...p) HVAT (*1-29..p)- wee (13,24) 
Hence 

var(2,-g5.-p)=5:2—Var(Xyso-p) =p ve (13.25) 
Thus, denoting var(x,-99...p) DY 5¥-23...ps S1-as--p being the standard error 
of estimate, we have 


Siesa p = (1 T iesp) ws» (13.25a) 
Because of (13,22a) and (13.25a), we may write 
Var(X ios.. ) a : 
r$ eea. p eo (13 26) 
aieea). ve (18:27) 
var (xı) 


Now, Xi-sa..p and X4-99...9 MAY bé looked upon as, respectively, the 
part of x, that is accounted for and the part that is left unaccounted 


376 FUNDAMENTALS OF 8BTATISTIOS 

for by its multiple regression equation on A rent: »*,- Hence 
Ti -23..p May be interpreted as the proportion of the total variance 
of x, that is explained by the multiple regression equation. Corres- 
pondingly, 1—r}.,,.., is the proportion of the total variance of xy 
that is left unexplained by the multiple regression equation. 


Also, equation (13.25a) shows that 54-93..p becomes smaller and 


smaller as 1,-.5..5 increases from zero to unity. When 1,.,3..,=1, 
S$+29..p =0, implying that *1o==%}-95...p,a for each a, and here the 
multiple regression equation may be viewed as a perfect predicting 
formula. At the other extreme, if Tı-23..p™=0, then var (Xj-95...p)=0, 
implying that Xy.95...p,a=°%, ie. is independent of x, hs; LAs sij 
Hence here the equation fails completely as a predicting formula for 
xı That is why the multiple correlation coefficient, 11.95...» may also 
be regarded as a measure of the efficacy of the multiple regression equa- 
tion as a formula for predicting x, when x, x3, 


apes s Xp are given, 
The quantity r}..5.., which is called the coefficient of determinalion* 


for the regression equation, may also be taken as such a measure. 


Example 13.2 For the data of Example 13.1, the multiple correla- 
tion coefficient of weight of d 


ry bark (x,) on height (x,) and girth at 
a height of 6" (x,) may be computed. We have 
742=0°768, 7,,=0°719 and r.,—0-520, 
Hence the required multiple correlation coefficient is 
E ORE Sorel =e 
Tigga [M12 tria — 2ra 
bs een ss ee 
=, /0 5325 ep LT ee 
0-796 = V 07299 = 0-854. 
It indicates that x, and x, have considerable influence on x. 
Viewed in a different way, it indicates that the multiple regression 
equation obtained in Example 13.1 serves as an excellent formula 


for 
predicting x, from given values of x, and z}. 


13.5 Partial correlation 
Sometimes the correlation between two variables, 
may be partly (or wholly) due to the influence o 


*The term is used to mean the proportion of the variance of the dependent vari- 
able that is explained by its regression on the independent variable or variables. 


Say x, and x», 
f a group of 


MULTIVARIATE FREQUENCY DISTRIBUTIONS 377 


variables, say xg, X43 eee xp on both x, and x, In such & 
situation, one may. want to know what the correlation between x, 
and x, would be if the effects of x3, %4) --++-- , xp on each of them were 
eliminated. This correlation: is called the partial correlation or net 
correlation between x, and xp eliminating the effects Of xa, Xgy++++29) Xps 
as opposed to their simple (or total) correlation. 

Consider the least-square linear regression equations of x, ON X3, 
Xs .seanae , xp and of x, ON Xp Xare «xp. We may write 

xy =X 34.p Fiap 

and Xa Xp aapt Arap 
where Xjgg..p and Ny ag...p are the predicted values of x, and ay, 
xias p ANd gge.. p being the crrors of estimation. Since xysı..p and 
Xgeggcp are uncorrelated with g, Xaa sre: , x» [vide equation (13.23)], 
these may be looked upon as the parts of x, and xp respectively, 
which are unaffected by this group of variables. Hence the simple 
correlation coefficient between xy-5¢..p and Xa-a4..p MAY be used to 
measure the partial correlation of x, and xp eliminating the effects of 
Xigy Rin <eeaee , Xp in so far as this can be done with the help of linear 
regression equations. This is known as a partial correlation coefficient 
and is denoted by fye 94...p- 

Thus, assuming var (xy-3¢..p)>0 and var (xoga. p) >0, so that 
R,, and Ry are both positive-definite, we have 


COV (X4+54... 99 Xae34-- ) 
TETEN = SE Ś s (18528 
19°84? Tear ERETT EFAN) ( ) 
According to our notation, 
12) 
pert. A ae = (1-8) HRA x (x, — z) 
11 (43 
RA S RSS 
try et R + play p TE 


where Ri) is the co-factor of 1; in R'®, the determinant obtained 

from R by deleting the 2nd row and the 2nd column. Or, putting 
=x; <. (18,29) 

(13.30) 


and TETEN =p T irm Fr 


RA 5 RY Sty t Rp ye St 
ty anno st ay ga MoE RING juss ve HER e 


378 FUNDAMENTALS OF STATISTICS 


Similarly, putting 


U9.34...p =X 9-94.09 —Kavgg...p = Xee34...p) as ( 1 3.30a) 
we have zh 
tasty het REX xin ute + BS x EBay, 
Analogously to (13.25), we ee 
var(ars p) =o? ese (13.31) 
RY 1 
u) 
and Var(xp-g¢...p) “a vee (13.32) 


Now, owing to the normal equations determining X,.,,.. p and 
Xys4..p) We have 3 


PH serie =O, Europ, o=0 ve (13,33) 

and Ruia Us+3400p, a =05 Fiia Ugs4..p,a=0, ... (13.34) 
I AEP PA M. MEOSE Y 
Hence 3 a 


NCOV(Xy-94...ps Zerga.. p= 24-24. +p, a Ugrgq.. by a= Ze Urat. p, a 


=Zthe te Ha Tiai HE maxa PEt algat Assas 
+i S Ziialipa 
( 
OF COV(Xy.54...p5 teas rafters Kreng +s) 


a Leiatljn =n COV (xp xj) = NTijS:S}) 


-pa ve (13.35) 
This is so because 
nR rsa RG + ry R.. HRY 


=determinant obtainable from RU) by replacing its first row, 


(Aitah. Tep)> With (rig fig oeae Tr») 
Fip Tyg. oscars Tip 
fas. Tag irre. : : 
=| ™ n "2° | =minor ofr, in R 
p38 ATA Th» 


=minor ofr ih R 


-7 Rir 


MULTIVARIATE FREQUENCY DISTRIBUTIONS 379 


Thus, in terms of the simple (or total) correlation coefficients rij, 


Paian hie (—RrelR W552 O 
IR RE) (RORY) 17 5152 


aoe vs (13,36) 
V RiRee 
since RD =R 
R= Rep 
pre R= RY | 


both being obtainable from R by deleting its first two rows and first 
two columns. 
In particular, with 
Ty Tia Ns 
Re | To, Tar "s | , 
Ta Tss Tas 


we have 
th h 
—Ry= | fas fe | =ra=risten 
Hf 
n=l i a 
Pe y 
eet Ram | hm a 
so that 
peas es: Se ws (13:87) 
nes 71—rigVl—ris 
a result which could be obtained directly from the regression 
equation of x, on x, and that Of x, ON Xp. : 
Unlike the multiple correlation coefficient, 
(13.38) 


-I< Tirs- < L. 


13.6 Some relations connecting partial regression and partial 
correlation coefficients 
We have seen in (13.12) that 
Ris se S, 


birmo = Ra Sy 


380 FUNDAMENTALS OF STATISTIOS 


Also, 
R 
Shas pss 
R 
siap =F" 
R 
and _ Tirap =— JEE 7 
114r: 
Hence 
biran Finny Ee, wes (13.39) 
Again, since 


R® 
S Faes =p 
RD 
SÈ +2dnp ~ Roy > 


and R! Y= Ry, R= R, and RP =RY, x 

: Rip x V Roos? 
VRaRe VRyse 
V (ROR SE 


V (RORY) 59 


D934... = 


=Ne34..p 


Erini pete? ey eae AIO) 


; i $34 p 
—8 relation of the same form as 
bu=ry i, 
with just the secondary suffixes 3, Core sp added to each term, 
We have also, analogously to the relation bia=cov(x;, xy)/var(x,), 


COV(i24..p» ¥a:24.p), vs (13.40a) 
Var(Xa34...p) 
Interchanging the suffixes 1 and 2 in (13.40), we have an 
analogous expression for bo1-34...p) and the two lead to 


rirunp =bn 3t-p barsa p ore (13.41) 


by 9-94...) = 


just as 
T e= bibir 


MULTIVARIATE FREQUENCY DISTRIBUTIONS 381 


13.7 Expression of a multiple correlation coefficient in terms 
of total and partial correlation coefficients 
We have, using the notation introduced in the last section, 


Zuian, a= 5 uraspi) a Maren, a” 
4 = 2 thogta) a [tia Op9+34.~p Ura — bises p Ugam 
by p-23.¢p—1 pal 
= Fuis), a Uya—byp-99..4p=1) È tiesto- aUpa 
= Zuis- a—91p+29..p=1) Dey-e8-19- 1) a Mp 23 pai), & 
or, on dividing both sides by n and applying (13.41), 


2 
$$ +29..p =[1— bips ibp aspan Sieten 


=(l1— r$ pernp) iesto ve (13.42) 
This shows incidentally that 
Siea p S Sieso) w+ (13.42a) 


or, equivalently, that 
Tieg p Z Vire3—(p—1)" ... (13.42b) 


So by introducing an additional independent variable in the multiple 
regression equation, one can only improve its usefulness as a 


predicting formula. 
Also, applying equation (13.42) successively to sies-(p-1:7 


Pipes ee , 53.9, we have 
SE sggecp = (L—1 Fe) (1ra). (Lar pas. tp— 19) 51" ae (1383) 
or L—r}.gg..p=(L—1 fs) =r fare) + (Lr? p esntpaay)* + (13-42a) 


Each of the factors on the right-hand side being less than or 
equal to unity, the multiple correlation coefficient rj..3.., must be 


numerically at least as high as any of the total or partial correlation 


coefficients of x, with the independent variables. 


*This is because of the normal bY ore determining the residuals. In general, 
when one takes the sum of such products, the secondary suffixes of one of the factors 
being common to those of the other, onc can drop any or all of the secondary 
Suffixes of the former. Likewise, one can add to the secondary suffixes ofthe former 


any or all of the secondary suffixes cf the latter, 


382 FUNDAMENTALS OF STATISTICS 


Suppo-e, without any loss of generality, that p=3. Since 
s?(1—r},) and 5;2(1—r?,) are the residual variances if x, is estimated 
from x, and x, individually, while s,2(1—r?.43) is the residual 
variance if x, is estimated from x, and x, taken together, it is obvious 
that the inclusion of an additional variable can only reduce the 
‘residual variance. Now, inclusion of xg, when x, has already been 
taken, for predicting x,, is worth while only when the resultant 
reduction in the residual variance is substantial. As (13.43a) 
indicates, this will be the case when the numerical value of 1,5.» is 
sufficiently large. This illustrates the importance of the partial 
correlation coefficient in deciding whether to include or not an 
additional independent variable in regression analysis. 


Example 13.3 Let us consider the data of Example 13.1 again. 
The partial correlation coefficient of x, (yield of dry bark) and x, 
(height of plant), the effect of x, (girth at a height of 6”) being 
accounted for, is 


D Tige — Tiasa 
T= -nn 
Vl—ri,V1—r3, 


Ea 0:394 =. 0394 0-663 
V/0-4830V/0-7296 0695x0854 7 
The partial correlation coefficient of x, and Xs eliminating the 
effect of xg, is 
Tip pete 
ri Vi-ri,Vi—rs 


i gb SAL. Lavage 
0410207296 0640x0854 : 


‘These values. may be considered together with the total correla- 
tion coefficients 


2=0:768 and 1,,—0-719. 


Since ry, is quite large, one will naturally take x, as an independent 
variable for predicting x,. The partial correlation r,,.., being equal 
to 0-585, indicates that the inclusion of x, as an independent 
variable, in addition to x,, would be worth while as it would 
considerably increase the accuracy of prediction. 


MULTIVARIATE FREQUENOY DISTRIBUTIONS 383 


Example 13.4 Ramakrishnan [Sankhyd, 2 (1935-36), pp. 43-54] 
considered annual data on the yield-rate of cotton, September 
rainfall, November rainfall and November maximum temperature 
foc an Indian district with a view to building up a forecasting 
formula for the first variable. To make the data free from time- 
dependent components, the ratio of each figure to the 5-year 
moving average value was taken. The new variables will be called 
*1) Xa, x, and x,, respectively. 

The total correlations were found to be : 

14,=0-410 ¥93= 0287 
rı} =0:307 Tu= —0'239 
f= —~0:619 ; > rą=—0:517 

Since, of the total correlations rip, 74, and rya the last one is 
numerically quite high and far higher than the other two, one’ 
should, of course, take x, as an independent variable in the regression 
equation of xy. 

Also, 
hoa Bas els A 

Vi—r?, VI —r} 
=0:3437, 
T13 — Tisas 
fisa = Vi-n WIR 
=—0-01934, 
haw fi — luu 
vi =r Vi- -ri 

=0-1966, 

The partial correlation r,,.4 is thus fairly high and much higher 
numerically than ripa Hence it would be advisable to include x, 


as an independent variable in addition to x4. 
As to the other variable, x,, its inclusion will not be worth while 


and 


since 
— Fy 904% ex 


re um Fie l—rig- ery ier = 
=0-0524, 
which is a negligibly small quantity. 


38% FUNDAMENTALS OF STATISTICS 


13.8 Expression of a higher-order coefficient in terms of 
coefficients of a lower order f 
We have 


Zoru op 6 Mea pe = Zu, Ietp-ih a Urag p, a 


+ Brac 1), a [Uza bp5-45.p Uga — bri 35... pea 
heset by 5-35.-49-5 Up al 
Pe a brpa- Arsip- alpa 
r Dy setp-1 0 Me 34--(0-1),8 
barsi- PEETI Up'stip-1) a 
Dividing both sides by n, we get 


COV(xy-34-.p, Xarsa p) =COV (raip) ®2-34-(p-1)) 
bap -34ctp—1) COV(Xy-94-.¢ pm 19> X p34. p—1))> 
or, using equation (13 40a), 
biman p58 34.9 =[Piarse-t9—1)—91 9-36--tp— 1) p2-94--¢p-1)]92-94.-¢p—1)" 
since 
Pap a6 9—19=P parsiip—1)58-96¢p—1)183-seth—1y° 
Also, from (13.41) and (13.42), 


Siap = (1-19 p.34.tp—1)) 88-404 p—1) 


=(1 Oey a4 p= 1) Op e-s4untp—t PS$-stcp—1)- 
Hence 


birar p = BE- bi p54 p= bpa aii, vs (13.44) 
L—bap-31tp- 1) 8pe'a4.cp—s) 
Applying equation (13.40) and simplifying, we have also 
AE T A E pT op AT TA 
Te 34.9 = ara A p PHT) 13,44, 
peg: (tri apan) (lr psan) : g 
Equations (13.44) and (13.44a) enable us to compute a regression 
coefficient or correlation coeficient of order (p—2) from thore of 
order (p—3). 


ee ye 


MULTIVARIATE FREQUENCY DISTRIBUTIONS 385 


13.9 Expression of a lower-order coefficient in terms of 
coefficients of a higher order 
Because of the normal equations, we also have 


Zur as.. palaset p-1),a=0 


& Zlte—his- sst- puza — bis repusa — 
—brpesotp=i Mpa lterse- apine =l 
or PC a — iraty Durata- a 


= by pea3cpm1) Zup atesty- 50 
oF Der secpmsnalleraeigaane— iise gupse- 
bipes- DZU etp- aera p= 1h =0. 
Hence, on dividing by n and using (13.40a) and simplifying, 
Dygsec(p—1)= Ore! s4.p tH Os presntp—s)0persentp—ai" (13.45) 
Also, if we interchange 1 and p in the suffixes, then 
bya-s4ncp—19= bpas- F Opr-aatp— 1) birst- 
Substituting this value for bpesg.cp-1) in (13.45), we have 
ultimately an expression for a regression coefficient of order (p—3) 
in terms of those of order (p—2), viz. 


brorsa.p +51 p-29.1p—1) paistp-1), .., (13.45a) 
T= by p-29..t9—1) Opr-ea--t9— 

Correspondingly, we have from (13.45) the following expression for 

a correlation coefficient of order (p—3) in terms of those of order 

(p—2) : 


hier ET} p-29-.1 p—1) "29-13-41 P=1) 


fis 
Tr? peata- 0) (Trips) 
In particular, with p=3, we get 


n bimst bieb as, (13.47 
bys 1—bys-2bs12 


Tiwas triga es £ 
=T= Sra Alr) w RA 
The attention of the reader is drawn. particularly to equations 
(13.37) and (13.48). : 


¥a(1)—25 


+ ve (13.46) 


and 


386 FUNDAMENTALS OF STATISTIOS 


Suppose for a set of data 
Tyq-3=0. 
We have then 
Ne=Ns" es 


which will not be zero if x, has non-zero correlation with x, and 
%, Thus, although x, and x, may be uncorrelated when the effect 
of x, is eliminated from each, they may appear to be correlated when 
this is not done (i.e. when the influence of x, on them is ignored). 
This shows that one should be cautious in taking a non-zero total 
correlation between two variables as indicative of a causal relationship. 
For the apparent relationship may really be due to the influence of 
another variable (or a group of variables) on both of them. 

In the same way, the absence of correlation between two 
variables as shown by their total correlation coefficient may only be 
apparent. For rp may be zero but 19.3, for instance, may not be, 
since in that case 3 

Tirs =— fisa asi 
which may not vanish unless at least one of Tissa ANd 95.) does. 

Besides, (13.37) and (13.48) also show that a total correlation 
coefficient may have a sign opposite to that of the corresponding 
partial coefficient. 


13.10 Multivariate normal distribution 

As in the univariate or the bivariate case, here too, we have to 
consider for some purposes theoretical distributions that may serve 
as simplified models of observed multivariate distributions. 


For continuous variables x,, x} .....- s Xp, this distributio n will be 
represented by a probability-density function, say GS Wey Sook. sp)» 
such that 

Play < x1 < by, ag < tg < bps ose: + @y <x < by] 
od ad | 
im f f f S čis ža +1- Ky) dying onde, 
So) Mw 

Taking, with no loss of generality, the range of possible values of 

x, to be from —oo to œ (i=l, 2, ...... s), we should note that 


the p.d.f. has to satisfy the conditions : 


MULTIVABIATE FREQUBNOY DISTRIBUTIONS 387 


PAE NE n » žo) > 0, for all values of xy, xy, sosi itp i 
(13.49) 


2 o al 
and j! o j} fi S (ay as e-e- Xp) Ot ydtpeevndey=l. +.. < (13.50) 
-0 -0 -0 
We need consider here just one theoretical distribution of the 
continuous type, viz. the multivariate (p-variate) normal distribution, 
whose p.d.f. is 


i 1 E T SRY Pe 
Í (žo Fay vee »®») (Qn)? ho, anac*Pl t220 (xi— p) (#;—p;)] 
=O L Zp Xp eere ETERA ove, (13,51) 
where =E(x;), we (13.52) 
„fva tiny vee (19.53 
Bd ot xj) =pyojo; if ij, ( ) 


|o | =determinant of (0;;), the variance- 
covariance matrix (supposed to be - 
positive definite), wes (13,54) 
and (i= (o4). ; vee (13:55) 

The more important properties of this distribution are as follows : 

(a) The marginal distribution of any p'(1 <p’ <p—1) of the 
variables is p'-variate normal. 

(b) The conditional distribution of any p' variables for fixed 
values of any p” other variables (p’, p” > 1; p'+p" <p) is also 
p’-variate normal, 

(c) Consider the conditional distribution of any one variable, 
say x,, for fixed values OF Xas Kap siri s Xp. The mean and variance 
of the distribution are given by : 


E(x lti tn o0++** > 2p) =m- $ pex Am) s. (13.56) 


R 
and  var(x;|že = » žo) “ki a’, "e (13.57) 


where R= |p| =determinant of the correlation matrix (p;;) 
and R,j=co-factor of py in R. 

From (13.56) and (13,57), we see that the true regression of x, on 
is linear and that the conditional distributions of x, 
, x, are homoscedastic. 


Xps Xp oreo » Xp 
for varying Xp Zp =e 


388 FUNDAMENTALS OF STATISTIOS 


Writing, in accordance with (13.20), 
R\ 112 
. == A 
P183... p ( Ry ? 
we have 
var (x; | x», Xay veneers > x,)=(1 — pi oap) o, wee (13.58) 


which shows the réle of the multiple correlation coefficient Pi-2-..p in 
the context ofa multivariate normal distribution, The higher the 
multiple correlation, the smaller is the conditional variance. 


(d) Again, consider the joint distribution of any two of the 
variables, say x; and x,, for fixed values of the others, The correla- 
tion coefficient between x, and x, in this distribution is the same for 
BU E PAAS 2X py Viz. 

beige 
V RaRa ~ 


From (13.36), this is seen to be the partial correlation coefficient 
Pissa..p- Thus under a multivariate normal set-up, the partial 
correlation py».34.. is nothing but the correlation between x; and x, 
in the conditional distribution of x, and x, for each set of fixed 
values of xs, x,, DETE 


Questions and exercises 


: 13.1. Obtain the multiple regression equation of x, on TON UAR 


x, in terms of the means, the standard deviations and the inter- 
correlations of the variables. 


13.2 Define multiple correlation and partial correlation, and 
indicate how they differ from sir ple (or total) correlation. Deduce 
the formula for a multiple and a Partial correlation coefficient in 
terms of total correlation coefficients, 

13.3 Prove the relation 


1—r}-99...p = (1 —1},)(1—r¥,.9) ise (TSF yea fp 233): 
Use this relation to shew that the multiple correlation coefficient 
is numerically greater than any of the total or partial correlation 


coefficients of x, with the other variables. 


MULTIVARIATE FREQUENOY DISTRIBUTIONS 389 


. 13.4 Find what the value of 1y.43.., will be if the independent 
variables are pair-wise uncorrelated. Ans. (r3?9-++r?g+-+-+-- +r?) 
13.5 Show that if r,,=0 for each i=2, 3, ...... , p, then 14.93...) =0, 
and conversely. What is the significance of this result in regard to 
the multiple regression equation of x, on x9, xg, +... xe 


13.6 (a) Show that ris» r,s, and rẹ, must satisfy the inequality 
Tła trio tria— 2ra < |. 
[Hint :, Use the inequality r?,., < 1.] 
(b) Suppose a computor has found, for a given set of values 
of x,, x, and xa, 
149=0-91, r,,=0°33 and r,,=0-81. 
Examine whether his computations may be said to be free from 
errors. 
13.7 Suppose xı, x, and x, satisfy the relation a,x, +d9%_-+d3%3=F. 
(a) Determine the three total correlation coefficients in terms 
of the standard deviations and the constants a,, a, and az. 
25 8__g 35,3 — ats, 
Ans. fiat ag aie ete. 
(b) State what the partial correlation coefficients will be. 
Partial ans. All are equal to —1, if aj, a, ag are of the same sign. 
13.8 Suppose all the total correlation coefficients of xy, #g, «+--+ s 
xp are equal tor. (a) What is the value of ry-95..9 ? (b) What are 
the values of the partial correlation coefficients of successive orders ? 


> ; 1 
Show that if r be negative, then r >— Fm 


Partial ans. 13-994... = pe 5 ; each partial correlation 
—2)r 
coefficient of order k is r/(1+Ar). 
13.9 Show that the multiple correlation coefficient r;.9...p is the 
highest possible value of the simple correlation coefficient between x, 


and a linear function of xg) 3, --..-- p» 
[Hint: Take the correlation coefficient of x, and Y=a+ dex, 
tby + bpp Show that the values of bp, bg, «++ + bp maxi- 


mising this coefficient are proportional to the partial regression 
coefficients of x, OM Xg, Xg =s Xps respectively. ] 


390 FUNDAMENTALS OF STATISTICS 


13.10 VWry=r(i=2, 3,......5p) and rj=r' (i, j=2, 3, ..... 03 IJ), 
then what is Tisai? - s (De 

ns. eGo 

13.11 Let x=u,+upt+....:. Hu, Hort Uat- +2, and yuz + ug 

lekari +u, +w tHwg t... +w, the u’s, v’s and w’s being random 


variables with unit variances and zero covariances. Show that the 
correlation coefficient between x and y is r/ V (r+s)(r-+1)- 
13.12 Let rypgq...9) be the correlation between x, and aN à 


residual of x, removing the regression of x5, x4, ...... »*»- Show that 
(i) reese.) S Tirap 
and (ii) 13 -09-p=Fi ptricy—i+p) MAG cane +r? a 34--p)° 


[Hint to (ii) : Rri 239, a= är, alia — birst.. p Xp94--p, al 


=xsi Pern a bironi Etra Xa-34..p, 09 
implying that 
1—r3-99..9=(1—r}-94...p) These py] 

13.13 What is a multivariate normal distribution? State its 
important properties. 

13.14 On the basis of observations made on 35 cotton plants, 
the total correlations of yield of cotton (xi), number of bolls, i.e. 
seed-vessels, (x,) and height (x,) are found to be 

i 712=0:863, r,,=0:648 and r,,=0:709, 
Determine the multiple correlation coefficient fye and the partial 
correlation coefficients r,,., and ryp, and interpret your results, 
Partial ans. 14+95=0-865 ; Ty9-3=0°751, r45..==0°101. 

13.15 The following constants are obtained from measurements 
on length in mm. (x;), volume in c.c. (xs) and weight in gm. (x) of 
300 eggs : 

#,=55-95 sı=2:26 Tyg=0°578 
¥,=51-48 5g=439 rı3=0-581 
3, =56:03 n=44] ra =0:974 
(a) Obtain the linear regression equation of egg-weight on 
egg-length and egg-volume. Hence estimate the weight of an egg 
whose length is 58:0 mm. and volume is 52-5 ee, 


Ans. Xq=3-49+0-053x, 40-968, ; 57-12 gm. 


MULTIVARIATE FREQUENOY DISTRIBUTIONS io tee 


(b) Give a measure of the usefulness of the above regression 
equation as a predicting formula. 
(c) Compute the partial correlation coefficient of weight and 
volume, eliminating the effect of length. Ans. 0.961. 
13.16 For a large group of students of statistics, x,=score in 
Theory, x,=score in Methods and x,=score in Lab. Work are 
approximately normally distributed. Also, 
z, =50:4 s=69 149=0°69 


#,= 451 556-4 13= 0°45 
¥,= 53-3 5,= 6:8 193 0°58 : 
Estimate the percentage of students whose total score exceeds 150, 
[Hint: It can be proved that if xis Xg, -=== , Xp are distributed 
in the (multivariate) normal form, then á-HbirytH baxet tb pep 
is also normally distributed.) Ans. 472%. 


SUGGESTED READING 


[1] Ezekiel, M. and Fox, K. A. Methods of Correlation and Regression 
Analysis (Chs. 10—15). John Wiley, 1959. 

[2] Goulden, C. H. Methods of Statistical Analysis (Ch. 8.). John 
Wiley, 1952, and Asia Publishing House, 1959. 

[3] Kenney, J. F. and Keeping, E. S. Mathematics of Statistics, Part II 
(Ch. 11). Van Nostrand, 1951, and Affiliated East-West Press. 

[4] Snedecor, G. W. Statistical Methods (Ch. 14). Iowa State 
College Press, 1956, and Allied Pacific, 1961. 

[5] Yule, G. U. and Kendall, M. G. An Introduction to the Theory of 
Statistics (Ch. 12). Charles Griffin, 1950. 


l 4 SOME OTHER TYPES OF 
CORRELATION 


14.1 Rank correlation coefficient 

For calculating the product-moment correlation coefficient, it is 
essential that measurements on the two characters are available. 
But in many cases the characters may not be measurable or, even if 
measurable.. may not be measured for limitations of cost or time or 
for lack of appropriate measuring instruments, Sometimes, although 
measurements may be available making it possible to calculate the 
product-moment correlation, a rough and ready substitute may still 
be called for to reduce the arithmetical work involved. It may be 
possible to use a rank correlation coefficient in all these situations, 

Suppose that it is possible to arrange the individuals according 
to the degree to which they possess a character under enquiry, 
although the character may not be directly measurable. Thus, for 
example, a number of operators may be arranged in order of effici- 
ency by their supervisor, although it may not be easy to offer any 
numerical measure of efficiency. Such an ordered arrangement will 
be called a ranking and the ordinal number indicating the position of 
a given individual in the ranking is called its rank. A ranking where 
two or more individuals are allotted the same rank is called a tie. 
To be specific, a rank r means that with respect to the character 
under enquiry r—1 individuals have the character in a higher degree 
than the individual getting the rank y. 


142 Spearman’s rank correlation coefficient 

First, let us consider the case where there are no ties. 

Suppose we have n individuals ranked according to two characters, 
A and B, in the orders u,, tig, ...... sun and x, vp, ...... s Ün, respectively, 
where the u’s and the v’s are permutations of the integers from 1 to n. 
Our problem is to have a suitable measure of the degree of relation- 
ship between u and v. Let dj=u,—v, The values of d give an 
indication of the closeness of the correspondence between A and B. 
We first observe “that the relationship would be positively perfect 


392 > 


SOME OTHER TYPES OF CORRELATION 393 


(i.c., there would be perfect agreement) if for each individual the - 
ranks in the first and second series would coincide, and in that case 


Jaj = F(u -n)t=0. 


Again, the relationship would be negatively perfect (i.e., there 
would be perfect disagreement) if the ranking in the first case were 
completely reversed in the second (i.e. if vj=n—u;+1 for all i), and 
in that case 


Pdi 2 (2uy— n= 1) 45uj*—A(n-+ 1) Zu; +n(n4 1)? 


an faint Gat 1) —4(n+ NREN +-n(n+-1)* 


=n —l) 


since Ju; is just the sum of the first n natural numbers and yut is 
i 


the sum of their squares. 
Spearman suggested as his coefficien 
the measure 
- 65d? : 
Prat wa (4.0) 
n(n?—1) 
Obviously, fp=1 for the case of perfect agreement and r,=—! 
for the case of perfect disagreement between the two series of ranks. 


This coefficient can also be deduced as a product-moment 
Taking u’s and v’s as variate-values, we have 


t of rank correlation (say, Tr) 


r= 


coefficient. 


Zurgai ntl 
ra R eB 


n 
Similarly, pant! : 
Noe | ate Sus — 
Also, sa Blu ii) ay i? 
=k n(n-+1)(2n+1)_ (n+l 2 ntl 
inet) TR 


and, in the same way, 
3 28), o 
eer | 


394 FUNDAMENTALS OF STATISTIOS 
Now, we have 
1 ~ 
YS dt) plua) (0—2) 
=s,*+5,*—2cov(u, v), 
so that cov(u, v) -Trn Yd}. 


Hence the simple correlation coefficient between u and v is 


cov(u, o) _ 12 mone è 
Ere OEE T E 
7 
634? 
--7 aA ve (14.2) 


Thus rę may be regarded as the simple product-moment 
correlation coefficient between the two series of ranks, 

Next, let us consider the case of ties. If the same rank is allotted 
to k individuals, then we have a tie of length k. If these k indivi- 
duals follow ; other individuals in the ranking, then each may be 
given the rang r+1. 

But we shall follow the convention that each of the k individuals 
is to be given the rank 


(EDHE Orth) REL 
Fi, ln ier Sane 0 Kaw 2 , 


which is the average of the H that these individuals would have 

- received had there been no ties. 
This tie then does not affect the mean of the ranks. However, 
it affects the variance. The sum of squares of untied ranks would be 


(r-+1)?+ (r4+-2)?+...... +(r-+-k)*=hr*+-k(k+1)r+4k(k-+1)(2k+1), 
and the sum of squares of the tied ranks is 


Wr EE ts ky Net ga EE, 


the difference being (k*—k)/12. Consequently, the variance is 
lowered by (k°—k)/12n in the case of tied ranks. Also, it is obvious 
that the effect of tying different sets is additive. 


80ME OTHBE TYPES OF CORRELATION 395 


Now, suppose that in the ranking with respect to the first 
character, there are s ties of length k,, kg ...-»+ , kp and in the ranking 
with respect to the second character, there are ¢ ties of length 
Kia Keay seers , ki. The variances would then be 


$,2=(n*—1)/12—7,,, where T,=% (k; —k;)/12n, 
=i 


and 5,2=(n*—1)/12—7",, where T,= È (kt —K)/12e 
Similarly, since 
2cov(u, v)=s,?+-5,7— Zain, 
the covariance for the case of tied ranks would be 
cov(u, v) = (n?— 1)/12— (T, +7 ,)/2— Zdi?|2n, 


so that Spearman’s rank correlation coefficient in the case of tied 
ranks becomes 


ol P47, Sas 
i a an 


Earl eel (14.3) 


nt — 
Pe =T.) 12 
< . 
In case there is perfect agreement between the two series of 
ranks, we shall have u;=v; and hence 


Pa = 
no ae 
Again, if there is perfect disagreement between the two sets of 
ranks, we shall have 3.=n—u;+1 and 
s =s,’ = (n*—1)/12—T,- 
Also, in that case T, =T. Further, 


Fath Flu- (a+ Pnt): 


As such, we then have 


eee | ie | 
tgr- (r) 


=P, 


=—l. 


fR= 


—l 
a 639 


396 FUNDAMENTALS OF STATISTIOS 


Example 14.1 Ten hand-writings were ranked by two judges in a 


competition. The rankings are given below. Calculate Spearman’s 
coeffic‘ent to measure the closeness of the two rankings. 


Hand-writing 


The differences d; between the two series of ranks for the 10 
hand-writings are: 
—3, 4, —2, —1, —3,7, —1, 1, —3, 1. 
Hence Zdi=9+16-+4414+94+49+4141+4941=100, 


Also, n(n?—1)=103—10—990, 


Thus Spearman’s rp is 


=1—0-606—0-394, 
Thus there is no close similarity between the two sets of ranks, 


Example 14.2 ‘Two supervisors ranked 12 workers working under 


them in order of efficiency as follows, Calculate Spearman’s rank 
correlation coefficient between the two rankings, 


EA er ee, Ga td SL 


ee a a 
Supervisor 1 5 6 1 2 3 8 8 4 7 «1 «oO 12 
Supervisor 2 Shh 12. ee eg 4:8 103 12 103 


In the first ranking, there is only one tie of length 2. Thus 


Ts hx Fr) == 0-0417. 


SOME OTHER TYPES OF CORRELATION 397 


In the second ranking, there are three ties of length 2, 3 and 2. 
Here 


1% (282 383 232 
eS 
' rad ie te tr } 
RES 
also, 
nla =f T3 casei ce 
and eae 1,9 
D> $= it +140414, +, 40414+, 4443) 
1255 
=y 10817. 
Thus 
11-9167 04174+2500 _ 1-0417 
ae aE MOM as ee 


V 11-9167—-0417-V 11-9167—-2500 v 11-8750 x 11-6667 


== 11:2500 
177745 
which indicates that the supervisors agree closely in their judgment. 


=0°956, 


14.3 Kendall’s rank correlation coefficient 

Kendall’s rank correlation coefficient 7 may be obtained as 
follows. First, let us suppose that there is no tie, Consider cach 
possible pair of individuals (i, j) and the order of this pair in the two 
rankings. If the pair appears in the same order in both rankings, we 
allot it a score of +-1, and if it appears in reverse orders, a score of 


—1. The score is thus obtained for each of the (3 )= ath 1) possible 
pairs. We then define a rank correlation coefficient y as 
a total score 
maximum possible total score 
total score we (144) 


BCN 


398 FUNDAMENTALS OF STATISTICS 


Obviously, 7=+1 for perfect agreement, because the score for 
each pair, each being in the same order in both rankings, is +1 ; 
and r=—1 for perfect disagreement, because the score for each pair, 
the pair being in reverse orders in the two rankings, is now --1. 

Suppose the ranking in one series is in the natural order, viz. the 
order 1, 2, ...... ,n. Let us consider the corresponding ranking in 


the other series. Suppose out of the ( ) pairs for the second series, P 
pairs have ranks in the natural order and Q pairs have ranks in the 
reverse order. Obviously, the P pairs will receive a score of +1 


each, while the Q pairs will receive a score of —1 each. Thus, 
according to (14.4). 


=1-22 ws (14.5a) 


= ws (14.5b) 


since P+Q=total number of pairs 


ec 


This indicates that in case neither of the two series of ranks has ties, 
in computing 7 one need determine P only (or Q only). 

7 may also be regarded as a product-moment coefficient. For the 
ranking with respect to the first character, let u, be the rank of the 
ith individual. For the pair (i, j), with i<j, we define a;; such that 

_ftlifu<u 
aht, ifu > rA 

We similarly define ġ;; for the ranking with respect to the second 
character. It can then be easily verified that 


r= 2 bij wes (14.6) 


V Yah V Ebh 
each summation being over all the possible pairs (i,j), with t<j. 


SOME OTHER TYPES OF CORRELATION 399 


In the case of tied ranks, we proceed almost in the same way, but 

now we take 

ay=0 if u=u,;, 
and, similarly, we take 

6,;=0 if 0; =0;- ` 
Thus if there is a tie of length k, the score is reduced by k(k—1)/2, 
since a,;X6,;=0 if either a; or b; or both are zero. Therefore, if 
there are s ties of length ky, kg; ...... , k, in the ranking with ‘respect 
to the first character, Sa;;# would be reduced to 


n(n—1) w my, E: 
E where Tem 2 filk— 1). 


Similarly, if there are ¢ ties of length k}, kgs «..... , k; in the ranking 
with respect to the second character, then $4,;* would be reduced to 


n(n—1) _ ae re 
ney Pn where T,=; X hi(kj—1)- 


The total score also will, naturally, get reduced. 
Thus the formula for 7 in the case of tied ranks becomes 


l 


It is apparent that both the coefficients, rg and 7, are easy to 
calculate, and these have been advocated as rough and ready sub- 
stitutes for the product-moment correlation between two measurable 
characters. In case the characters are not measurable in practice 
or are very difficult to measuré, the rank correlation coefficients may 
be used as measures of the correspondence between the two 
characters. Kendall’s r has an advantage over Spearman’s rp in that 
it may be adapted more easily to the theory of sampling. 

Example 14.3 Let us calculate Kendall’s + coefficient for the 
data of Example 14.1. 

To calculate 7, it is convenient to rearrange one ranking so as 
to put it in the natural order: 1, 2, ...... sn, If we do so far the 
ranking by Judge 1, the corresponding ranking by Judge 2 becomes : 


2, 1, 6, 5, 7, 9, 10, 4, 8 and 3. 


400 FUSDAMENTALS CF STATISTICS 


The score obtained by considering the first member, 2, in conjunc- 
tion with the others is 8--1=7, because only 1 is smaller than 2. 
Similarly, the score involving the member 1 is 8, the score involving 
the member 6 is 4—3=1, and so on. The total score is 


74+8+142+1—2—3+4+0-—1=19—6=13. 
On the other hand, the maximum possible scure is (10x 9)/2=45. 


Thus 
7=13/45=0-289. 


Example 14.4 To calculate + for the data of Example 14.4 we 
rearrange the ranking of Supervisor 1 in the natural order, and 
then we have the two sets of ranks as follows : 


Sopaver tol f.-o, sa BiG 7 BE ft 0 it 12 
| 


Supervisor 2 22552 24 SE T TIN 9 7°12 103 10} 


The total score in this case is 
94949484+64643+43+43—2—0=—54. 
Here T.=}(2xH=1, 
and T’,=4{2x143x242xH=5, | 
Hence, from formula (14.7), 
inten vie a 
Vv 66—1V66—5 


Thus both Spearman’s rp and Kendall’s y indicate a very high 
degree of agreement between the two series of ranks. 


14.4 Grade Correlation 

The ranking of an individual, as the rth in a group, may be 
regarded as a numerical statement to the effect that there are (r— 1) 
members who are given precedence over that individual. We will 
then define the grade of an individual as the proportion of individuals 


SOME OTHER TYPES OF CORRELATION 401 


in the whole group with a lower variate value than the value possessed 
by that individual. If we have a finite population of size V, the 
grade of an individual ranked, according to the variate values, as 
the rth (assuming that the ranking proceeds from the higher to the 


lower variate values) will be es T, In case the population is infinite 


and the variable continuous, its members cannot be ranked ; but if 
a sample of size n is selected, the grade of an individual with rank r 
can be estimated if it is assumed that one half of that member is to 
be assigned to each of the two parts into which the variate value 
divides the variate range, and we have g,, the grade corresponding 
to rank r, as 

ts ad od, ss (158) 


For a continuous bivariate distribution, there will be no rank 
correlation, but one can nevertheless think of a grade correlation. 
To each individual of the population, there will be attached two 
grades corresponding to the two variate values, and the product- 
moment correlation of these grades is the grade correlation. Fora 
bivariate normal distribution with correlation coefficient p, it can be 
shown that 

p=2sin(mp,/6), = cae (14.9) 
where p, is the grade correlation. This relation, however, should 
not be used to transform a rank correlation coefficient obtained from 
a sample into the product-moment correlation coefficient in that 
sample, or in the population from which the sample was taken, unless 
the population is known to be bivariate normal and the sample size 
is sufficiently large. 

Example 14.5 Find an estimate of the correlation coefficient in a 
bivariate normal population if the rank correlation from a sample 
is 0-45. 


Here p, (estimated) =0-45, 
so that “Pe (estimated) =13-5° 
Thus p(estimated) =2 sinTke (estimated) 
=2sin 13-5°= 0-47, 


Fs (1)—26 


402 FUNDAMENTALS OF STATISTIOS 


14.5 Intra-class correlation 

We have considered in Chapter 12 the correlation between two 
clearly definéd variates, such as the marks in college test and marks 
in university examination for a group of students or the family 
income and percentage of family income spent on food for a group 
of families. There sometimes arise cases in which we :equire the 
correlation with respect to a particular variate between members of 
the same class, e.g. with respect to marks in a university examination 
of students belonging to the same tutorial group or with respect 
to height of brothers in the same family. By correlation here we 
mean the extent to which the members of the same class (group 
or ‘family’) resemble each other with respect to the given variable. 
Such a correlation we shall call intra-class correlation, to distingush it 
from ordinary correlation, which may be called inter-class corr: lation. 

Suppose we are investigating the correlation between heights of 
brothers, and suppose there are two brothers in each family. If we 
take the height of the elder brother or the taller brother as the first 
variate and the height of the younger or the shorter as the second 
variate and find the correlation, we would get the correlation 
between the height of elder brother and the height of younger brother 
or between the height of taller brother and the height of shorter 
brother and not the relationship of heights of brothers in general. 
To get the relationship of heights of brothers in general, we have to 
take in turn the height of each brother as the first variate and that of 
the other as the second variate, and thus get two pairs of heights for 
each family. Similarly, if there are & brothers in the family, there 
will be k(k—1) pairs. Thus if we have p families of k brothers each, 
there will be pk(k—1) pairs of values in the correlation table. Let 
x, denote the variate value for the jth member of the ith family 
HEN 2, Tae AR A A serece ,k). If we arbitrarily regard the first 
column of the correlation table as corresponding to a variate U and 
the second column as corresponding to a variate V, then the mean of 
each variate is given by 


O-P =p RE) 2a eee 


betes mean of x, e+ (14.10) 


SOME OTHER TYPES OF CORRELATION 408 


since each of the values žų occurs (k—1) times in each column of the 
correlation table, along with the values for the other (k—1) members 
of the family occurring in the other column. 


Similarly, the variance of each variate is given by 
b=9= es, St-) Zeus 


wet —Z)9— 52 š ‘ 
eR EO #)*=s*, the total variance of xe a (14,11) 
The covariance between U and V is given by 


cov(U, V) = =i ETÈ 2 È {Ful —2), 


fej 


where the second summation extends over al] pairs of members j’) 
in the ith family, with jj’. But 


k 
È (zya) (xj) ~ 
ie 


= 
= 5 Bova) (xi — 2) -Èe 


= S(s) Žley—a) T ET 
j=1 vs. jy 
SUR È(eu—a), 


k 
where z= 2 Xij the mean of x in the jth family. 
=A 
Thus 


J ae 
cov(U, V) = aE pe) 2 z)? pkey 2, aaa). 


(14.12) 
Writing s3 =; 36-2) the variance of the means of the p families, 
= 
we have 


cov(U, V) = 8 ad 


404 FUNDAMENTALS OF STATISTIOS 


Thus the coefficient of intra-class correlation, rz, is given by 


5d Sip EEA 
r= (U, V) __k-1"_ k-1 
: V var(U)var(V) s 
aa f ve (1413 
=h 1}. (14 13) 


Note that 0< s3 <s?. Hence this coefficient isa maximum, viz. 
equal to +1, when s2=s?, i.e. when the variance between the means 
is equal to the total variance, which happens when the variance 
within families is zero. In this case the variate-values for members 
within each family are all equal. 

Again, this coefficient is a minimum, viz. equalto —, + , when 
s2, the variance between family means, is zero, which happens when 
the variance within families is the maximum possible (for a given 
total variance). Thus the coefficient of intra-class correlation may be 
looked upon as a measure of the extent to which the total variance 
may be explained away by the variance between means. 

Some caution is necessary in the interpretation of the intra-class 
correlation. It is clearly seen that here r, varies from — — to + 1. 
The lower limit is larger than —1 unless k=2. It is thus a skew 
coefficient in the sense that a negative value has not the same 
significance (as a departure from zero) as the corresponding positive 
value. 

In the general case, when there are k; members in the ith class, 
in the correlation table each member of the ith class will appear 


(kj—1) times in each column in association with the other members 
of the class. Here 


capa! $k- F 
SKEN k= 1) PA say 
(which is not the grand mean of x), 


where N= È k(k-1), 


and the first summation is over the p classes, while the second is 


ea ETENN 


SOME OTHER TYPES OF CORRELATION 405 


over all members of the ith class. Similarly, 


k; 
bahay, i- E yah 


Again, 
k 
EDA | i Ea h 
cov(U, V\= 5 $ pp CUTE (i — 3), with j4j 
a 
1 k 1 k; 
=F k paa (xij — 2o) (xi — a) “ý 2, Efu 


(where J’ extends over all possible pairs, including the case j=j’) 
1 a 1 
=A Fo) FEE (*i3—¥o)*, 
- x k . * 
where z; is the mean of the ith class, being equal to iz Xij 
iisi 
Hence : 


cov(U, V) DA (3—3) — 22 (#3)? 


~ Vvar(Uyvar(P) SENS a a ee 


Tr 

Example 14.6 The weights in gm. ofa number of copper wires, 
each of length 1 metre, were obtained. These are shown below 
classified according to the die from which they come, Determine 
the intra-class correlation. 


Die No. 
I II il IV Vv 
e 1:33 1-30 1:32 1:31 1:30 
1 32 1-35 1-29 1-29 1-32 
1:36 1-33 1°31 1-33 1-33 
1:35 1-34 1-28 1-31 1:33 


The intra-class correlation coefficient is invariant under a change 
of base and scale. Let our new variable be 
u=100(x—1-28), 
where x is the original variable. 
This means that 
u= 100(x,;— 1-28) 
for i=1, 2, 3, 4, 5 and j=l, 2,3, 4. The values uj; are shown 
in Table 14.1. 


406 FURDAMENTALS OF STATISTIOS 


TABLE 14.1 
Weronts or Correr Wines arrer Crancr oF Base anD SOALE 


eee 


Die No. 
I II HI IV v 
5 2 4 3 2 
a 4 7 1 1 4 
8 5 $ 5 5 
7 6 720 3 5 
Total 24 20 8 12 16 


Here the grand mean is a=50=4.0. 


Also, fa) A 
Bape” 
= pp{l+0416+9444+941444049+41416 


$1494141444044+1) 


88 _4 4, 


f aaj 
and s= 5 (ai= {t+ +4+140)=2. 


Hence the intra-class correlation coefficient r, is given by » 


ume 4 


ey! 
= f{r-sis—1} 
0818, 
= “gy 273. 
This indicates that the copper wires coming from the same die 


resemble each other, in respect of their weights, only to a moderate 
degree. 


B 


SOME OTHER TYPES OF CORRELATION 407 


14.6 Population intra-class correlation 
In many situations, the value x; may be supposed to have arisen 
from sampling a population in two stages : first a family is chosen 
at random from a whole set of families, and next a member is selec- 
ted at random from all members of the chosen family (and the value 
of x for this member being noted). In such a case, one may write 
xy=ptm +i oe (14.15) 
where p (a constant) is the mean of x for all individuals in all 
families taken together, m; (a random variable) is-the amount by 
which the mean of the chosen family differs from p and ex; (a random 
variable) is the amount by which xj; differs from the family mean. 
We shall denote the variance of all m; by of» while the variance 
of e; (assumed to be the same for each i as well as j) will be denoted 
by of. Further, it ‘will be assumed that the m; are mutually 
independent, the e; are mutually independent, while the m; are 
independent of the ¢;;.* 
Now, if we consider two members of the same family, for which 
the values of x are x;j and xij then 
var(x;j) =var(xiy)=o% +02 =0° (say), 
while 
COV (Xij Xir) = 0m 
Hence the correlation coefficient between xij and x; which is 
nothing but the intra-class correlation coefficient under the above 
model, is 
om „.. (14.16) 


Pra 


This is seen to be the proportion of the total variance of x that is 
represented by the variance of family means. Hence the higher the 
value of o (i.e. the smaller the value of g2) relative to o°, the higher 
is the intra-class correlation. 

Formula (14.16) may be seen to have the same form as formula 
(14.13) if we remember that under the present model k is supposed 
to be practically infinite. 


*For a further discussion of this model, see Section 1.5, Vol. 2 


408 FUNDAMENTALS OF STATISTIOS 


14.7 Tetrachoric correlation : 

Consider a 2x2 frequency table, where the two characters 
involved may not be variables at all. A measure of the association 
between the two characters may still be obtained, based on the 
ordinary product-moment correlation coefficient. The method to 
“be considered here is thus especially suitable when we wish to 
measure the association between two characters neither of which is 
measurable, but both of which lerid themselves to separation into 
two categories. 

Thus students may be classified as above average or below 
average in intelligence and also as emotionally stable or unstable ; 
families may be classified as rich or poor and also as big or small. 


TABLE 14.2 
A 2x2 Frequrnoy TABUR 


se | A Ay Marginal total 
B-class 
a Bı | a 


| b a+b 


B, ¢ | d | e+d 


{ 


Let the two characters be A and B. Further, let the total 
frequency and the cell-frequencies of the 2x2 table be as shown in 
Table 14.2. If this table is assumed to be derived through a double 
dichotomy of a bivariate normal distribution, say with p d.f, 


de. 1 l 
SF (xs 9) =e) 
(14.17) 


then one may determine p in terms of a, b, c, d and n. The value of 
p obtained in this manner is the desired measure of association and 
is called tetrachori¢ r. Thus this mode of measuring association 
assumes that the two characters under study are essentially conti- 
nuous and would be normally distributed if it were possible to 
measure them. 


r 
ee ee UT 


SOME OTHER TYPES OF OORBELATION 409 


Here we shall suppose that the variables x and y correspond to 
the A-classification and the B-classification, respectively. Let the 
value of x that forms the boundary between the classes A, and A, be 
h and, similarly, let k be the value of y that forms the boundary 
between the classes B, and B,. Since the marginal distributions of 
x and y are normal with p.d.f.’s 


sls) = gle] a (14.18) 
and iy) = gy, Pl- ve (14.19) 
we have 
feteraemto0 wee (14.20) 
à 
and 7 
fiob=a+t)h. we (14.21) 
k 


Having determined h and $ from the equations (14.20) and (14.21), 
we are then to solve for p the equation j 


— e 


J Ai pdxdy=aln: oe (14.22) 
È k 


Equation (14.22) can be expressed in the form 
z pirimi =a/n- wes (14.22a) 
=0 


The functions 7; are called the tetrachoric functions of h, and the 
functions hy the tetrachoric functions of k- These functions have 
been tabulated for values of i up to 19 in Biometrika Tables, Vol. I. 
With the help of these tables, one can solve (14.22a) by successive 


approximation and thus evaluate the tetrachoric coefficient r. 


14.8 Biserial correlation 

This measure is appropriate for a tx2 frequency table, where 
the dichotomy may relate to some qualitative character but the 
other classification relates to a variable. 


410 FUNDAMENTALS OF STATISTICS 


TABLE 14.3 + 
A (x2 Frequsnoy TABLE 


Class 1 Class 2 tee Class ¢ | Total 


TE 
A-class ~ 


Total | ny a PAET TAS ni n 


If the two characters.are denoted by A and J, respectively, then 
the frequency table may be laid out in the manner of Table 14.3. 
It will now be assumed that the A-classes arise from the division 
-+ Of the range of a continuous variable x by a point, say h, the class 
A, corresponding to values greater than A and the class A, ta values 
less than or equal to A, If we further assume that x and y are jointly 
normally distributed variables with means #, and p,, variances 
o,* and o,?, and correlation p, then the mean of the conditional 
distribution əf y for the class A, (i.e. for h<x<oo) is 


h= i [ i aft yla)d]ets)ax/ Í sl)ds, 


where f(x, y) is used to denote the Joint density function.of x and y, 
&(x) the marginal density function of x and J(y|x) the conditional 
density function of J, given x. 


a 
Using the results of Section 12.9 and denoting i g(x)dx by P, 
k 


we then have 


m= f fus + 52m.) }e(x)dx|p 
A 


1 re 
=r, tpBxee | Feel ab see) 


=1,+5 x pay | uexp[—ut/2]du 


(k= io, 


= 


SOME OTHER TYPES OF CORRELATION 4il 


=p, + e$t) „ (14.23) 


¢ being the p.d.f. of the standard normal distribution. 
In the same way, the mean of the conditional distribuion of y 
for the class A, (i.e. for —œ <x <h) is 
KESE A hs = 
=H (5#) we (14-24) 
k 
where Q stands for if g(x)dx. 
-0 


From (14 23) and (14.24), we have 


ha 55 I (=) 


Cx 
or y 
(tp) PQ . we (14.25 
g pho i ; ; 
where 


r= (h—pa)los 
is the value of the standard normal variable to be looked up while 
consulting the normal tables for the ordinate ¢- 

Now, P and Q will be estimated by p=f/n and q=1—fla 
respectively, o, by s, the observed standard deviation of y, while p; 
and pz, will be estimated by Jı and Jy the conditional means of y 
corresponding to A, and Ás respectively. An estimate of p is thus 
given by ; é 


(JJ) 9 we (14.26 
E7 Gas s ; : 


which is called biserial correlation. 
If y be the unconditional mean of y, then 
J= + Vw 
so that we have the alternative formula 
r=), ve (14.27) 
O(Th)s 
For some examples involving the use of tetrachoric and biseria! 
correlation, see [3]. 


412 FUNDAMENTALS OF STATISTIOS 


Questions and exercises 


14.1 What is a rank correlation coefficient? Deduce Spear- 
man’s formula for rank correlation coefficient. How should the 
formula be modified for tied ranks ? 


14} Discuss the rationale behind Kendall’s 7 coefficient for rank 


correlation. Aiso indicate how the formula can be adapted to the 
” case of tied ranks. 


14.3 Show that both Spearman’s rp and Kendall’s r will lie 
between —1 and +1. Interpret the marginal cases, 

14.4 Define intra-class correlation and distinguish it from inter- 
class correlation. Derive the formula for, intra-class correlation 
when a variate x is observed for p families, eath consisting of k 
taembers. 

14.5 Show that the céefficient derived in Exercise 14.4 lies between 


the limits =e and +1. Interpret the ‘marginal cases, 


14.6 Show that the intra-class correlation coefficient defined by 
(14.14) also takes its highest possible value +1 when the members of 
each family have the same value of x. 

Verify that formula (14.14) reduces to formula (14.13) in case 
k;=k, i.e. in case the families are all of the same size. 


14.7 Write notes on tetrachoric correlation and biseria 


l correla- 
tion, and state where they would be appropriate. ó 


14.8 Two judges rank a number of competitors in a certain art 
competition as follows : 


Competitor _ 
Des 2.. sta Sy Ree ar ee area 85 11 12 


a EADE Ln O BEE a S g 
A E N E ae E a a e E E AET, 


Measure the association between the judgments of the two judges 
by using (1) Spearman’s rp and (2) Kendall’s 7. 


Ans. rR=0:706, 7=0°515. 


SOME OTHER TYPES OF OORRELATION 413 


14.9 Six boys and six girls are ranked below on the basis of their 
performance in a mathematics test. Do you find any association 
between sex and performance? Use (1) Spearman’s rg and 
(2) Kendall’s 7 formule for tied ranks, assuming that, in ranking 


with respect to sex, boys are considered superior to girls. 


Sex | B.S Bs OBS G>°G- 8G) 8 G 
et S 
Rask Paes Rae’ pec saree i.i 65 8 95 Ib 95 12 
Se oe 
B=boy, Gegirl. 


Ans. D Tp=0°342, 7=20°299. 

14.10 Find Spearman’s rank correlation coefficient for the data 
used in Exercise 12.17 and comment on this value and the value of 
the product-moment correlation coefficient for the same data. 

14.11 For cach of six families, the heights in inches of three 
brothers belonging to it are recorded below. Gompute the coefficient 
of intra-class correlation. 

Family Heights of brothers 
69-5 70-6 723 
71-2. 70-8 720 
656 67:2 66 7 
622 636 63°5 
68:0 705 70:5 
644 643 - 646 
: Ans. r;=0910. 

14.12 The birth-weights of babies born to 5 mothers in a 

Calcutta nursing home are given below. Compute the intra-class 


outer ooN- 


correlation. 
Mother * 
1 2 3 4 5 
E ee ee Oana E 
61b. 30z. 7 Ib. 9 oz. 5 lb. 100z, 8 Ib. 6oz. 6\b, 902 
6lb. 80z. 81b. 2 oz. 61b, 2 02. 71b. 100z, 6 Ib, 6 02. 
6 lb, 0 oz. Glb. 40z, 7lb. 5 oz 
6lb, 402. ws 


Ans. Ti =0°826. 


e 
414 FUNDAMENTALS OF STATISTICS 


14.13 Measure the association between use of quinine and 
attack of malaria for the data of Table 11.1 by means of tetrachoric 
correlation and comment. 


14.14 Evaluate the biserial co relation for the following data 


and comment on the degree of association between age at death 
and sex: . 


Age at death | 0-9 10-19 20-29 30-39 40-49 50-59 60-69 70-79 80-89 


Males 208 38 35 38 67 «#2113 188 181 19 


Frequency 
Females | 182 29 22 32 52 63 124 135 60 


SUGGESTED READING 


[1] Fisher, R. A. Statistical Methods Jor Research Workers (Ch. 7). 
Oliver & Boyd, 1948. ; 


(2] Kendall, M. G. Rank Correlation Methods (Chs. 1—3). Charles 
Griffin, 1948, 

[3] McNemar, Q. Psychological Statistics (Ch. 12). John Wiley, 
1962. 

{4] Yule, G U. and Kendall, M. G. Introductio 


n to the Theory of * 
Statistics (Ch.ell). Charles Griffin, 1953. 


15 RANDOM SAMPLING AND 
SAMPLING DISTRIBUTIONS 


15.1 Random sampling 

The selection of a sample (consisting of, say, n members) from 
a population (having, say, members) may be done in a number 
of ways; that is to say, we may have different types of sampling. 
Suppose, for instance, that a social scientist wants to determine for a 
year the average income per family for families residing in Calcutta. 
To know this figure exactly, he will have to know the income of each 
of some ten lakhs of families living in the city. This will require a lot 
of time and money, which the enquirer may not be in a position to 
afford. To manage within his prescribed time and limited resources, 
he will in such a case study a sample of families only—some 1,000 or 
ill base his conclusions on the characteristics 


10,000 of them—and wil 
of the sample. In this investigation, the n families in the sample 


may be the first n or the last n appeating in the National Register 
for Calcutta ; alternatively, they may be chosen by considering, say, 
every 100th family in the register beginning with the first entry} ` 
o on. All these methods, however, are defective in that they 


and s 
frequently lead to unrepresentative samples. A more serious draw- 


back is that, in such sampling procedures, no idea can be obtained 
from the sample regarding the possible deviations of the characteris- 
tics of the sample from the characteristics of the population. 

To avoid the second drawback, one may use what is called 
probability sampling. It also takes care of the first defect in the long 
run (in a sense to be explained later).* In this case, the sampling 
procedure is such that each member of the population gets a definite 
probability of peing included in the sample. The simplest and the 
most commonly used type of probability sampling is simple random 
sampling (ot random sampling, for short). In this kind of sampling, 
each member of the population has the same probability of being 


included in the sample. 


* For example, if ¥ be the sample m 
infinite papaano for which the me 
estimator Of p» 


ean of x for random saroples drawn from an 
an of x exists and is H, then & is a consistent 


416 FUNDAMENTALS OF STATISTICS á 
; ea 
We may again have two distinct types of simple random sampling. 
In one case, the n units of the sample are drawn from the population 
one by one, after each drawing the individual selected being returned 


to the population, in such a way that at each drawing each of the 
N members of the population gets the same probability Wy of being 


selected, This is simple random sampling with replacements (SRSWR). 

Clearly, here the same unit of the population may occur more 
than once in the sample; there are V" possible samples, regard 
being had to the order in-which the n sample units appear, and each 


has the probability Les materialise. 


N" 3 
- If, on the other hand, the n members of the sample are drawn 
~one by one but the member obtained at any drawing is not returned 
t@the population and if at each stage every remaining unit of the 
population (at the rth drawing cach of the remaining N—r+ 1 units) 


is given the same probability ( Woe at the rth drawing) of being 
-=r 
. included in the sample, then we have simple random samplirg 
without replacements (SRSWÒR). 
Here no member of the population can occur more than once in 


the sample. There are () conceivable samples, provided the order l 


` in which the sample units are obtained is ignored, and each such 
sample has the probability 
n „n=l 1 l 
KK er = 
N Mele Wael ^ 
n 
to materialise, This is so because at the rth stage one is to choose 
from N—r+-1 individuals one of the n—r-+-1 individuals to be 
included in the sample which have not yet been chosen in earlier 
drawings. It may be seen that in this case, too, the probability that 
any specified individual, say the ith, is selected at any drawing, say 
the kth drawing, is 
N—1 N-—2 N—k+! 1 1 
r X X seres. Sapma Y ennag E 
N *N=1 X NEEF NEFI” N 
as in simple random sampling with replacements, 
It is obvious that if one takes n individuals all at a time from the 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 417 


population, giving equal probability to each of the (*) combinations 


of n members out of the W members in the population, one will still 
have simple random sampling without replacements. 

Some practical methods of obtaining random samples will be 
discussed in Volume 2. 


15.2 Parameter, statistic and its sampiing distribution 

Generally in statistical investigations, our ultimate interest lies in 
one or more characters possessed by the members of the population. 
On taking a sample, we shall then observe the forms or the values 
of the characters for the individuals included in the sample. 
Supposing there is only one character of importance, it can be 
assumed, without any loss of generality, to be a variable x ; for the 
case of an attribute can also be tackled by methods meant for a 
variable.* If x; be the value of x for the ith member of the sample, 
then x, x9, »..... »*, are the sample observations. 

Generally, again, our primary interest will be in knowing the 
values of different measures of the population distribution of x, like 
its mean, standard deviation, etc. A measure of this type, calculated 
on the basis of population values of x, is called a parameter. (The 
word ‘parameter’ is being used here in a broad sense. In a 
narrower sense, a parameter is a measure that occurs in the proba- 
bility distribution of the variable; e.g., A is the parameter of a 
Poisson variable, » and o are the parameters of a normal variable.) 
A corresponding measure computed on the basis of sample values is 
called a statistic. 

Since the sets of population members included in different 
samples from the same population may be different, the value of 
the statistic itself is liable to vary from one sample to another. These 
differences in the values of a statistic are called sampling fluctuations. 
Thus if a number of samples, each of size n, are taken from the 
same population and if for each sample the value of the statistic is 
calculated, a series of values of the statistic will be obtained. If the 
number of samples is large, these may be arranged into a frequency 
table. The frequency distribution of the statistic that would be 


*In the case of an attribute, one will deal with the sample frequencies for the 
different classes, which are variables assuming non-negative integral values. 


ra (t)—27 


418 FUNDAMENTALS OF STATISTICS 


obtained if the number of samples, each of the same size (say n), 
were infinite is called the sampling distribution of the statistic. In the 
case of random sampling, the nature of the sampling distribution of 
a statistic can be deduced theoretically, provided the nature of the 
Population is given, from considerations of probability theory. 

Like any other distribution, a sampling distribution may have 
its mean, standard deviation and moments of higher orders. Of 
particular importance is the standard deviation, which is designated 
as the standard error of the statistic. As illustrations, in the next section 
we derive for the case of random sampling the means (expectations) 
and standard errors of a sample mean anda sample proportion. 

Some people prefer to use 0:6745 times the standard error, which 
is called the probable error of the statistic. The relevance of the 
probable error stems from the fact that for a normally distributed 
variable x with mean p and s.d. ø, 


P[u—0:67450 < x < p+0:67450]=0:50, 
approximately. 


15.3 Expectation and standard error of sample mean 

Suppose a random sample of size n is drawn from a population 
of size N. 

Let Bafa) 253.8655 >N) sey dS) 
be the value of the variable x for the ath member of the population, 
Then the population mean of x is 


1 
Se A ase 2 
way 2X t (15.2) 
and the population variance is 
1 
2 = Vd (Xa— p). HINAS: 
Again, let us denote by 
A I 2,08 ,n) .. (15.4) 


the value of x for the ith member (i.e. the member selected at the 
ith drawing) of the sample. The sample mean of x is then 


Bi Ea. ve (15.5) 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 419 


For deriving the expectation and standard error of $, we may 
consider two distinct cases : 


Case 1. Random sampling with replacements 
It is immediately seen from Theorems 3.6 and 3.7 that 


E(s)= $ ZE) 
and var (z) =E{z—E(z)}* 
= REIZE) 


= SEE Bled +4 Bet Ble) eB) 


= 5 Zvar (x) a pense xj). 


To obtain Z(x,) and var(x;), we note that x; can assume the 
valites Xj, Xq. ..25<- , Xy, each with probability = 


Hence Ele) —EX.P[si=X.J=PXeX pM 


and  var(z)=E(x—)'=}(X.—n)* Piri =X] 


=Z(X. u) x =o, 


for each i. 
Again, 
cov(xp xj) =E(x;—p)(x;—p) 
=} (Xa —H)(Xar—pw)P[xj=Xa; j5 Xe] 

Since in sampling with replacements the composition of the 
population remains the same throughout the sampling process, x; can 
take any one of the values Xj, Xq, ...--- » Xy» with probability 1/N, 
irrespective of the value taken by x; In other words, for ij, x, 
and x; are independent, so that 


Plu=Xe, 4=Xel Pl Xo]Plxj= Xe] =p: 
Hence cov(xn x)= Z (Xan) (Xa —h) 


=a AXE (Xia) =0, 


420 FUNDAMENTALS OF STATISTIOS 


for each i, j (i<j), since 2(Xe—h) =2(Xar—z); being the sum of 


the deviations of X,, Xp, ...... » Xy from their mean, is zero. 
Hence we have, finally, 


B(3) =! xm=p eve (15.6) 
and var(3)= xno Xa(n—1) x0 
ue vs (15.7) 
n 
The standard error of z is, therefore, 
ay a s.. (15,8) 
Case 2. Random sampling without replacements 
$ As before, for each i, 
E(x;) =p and var(x4) =o, 
since here too x; can take one of the values Xj, Xp --.--. ,Xy with 


the same probability x The covariance terms, however, need 
special attention. 
Here, for i4j, 
Plti=Xe, t= Xe.) =Plxy=Xe]Plxj;—= Xe | x;= Xo] 
I 1 F E 
=7>% Wai if aa 
(since x; can take any value except X,, the value which is known to 


have been already assumed by x, with equal probability Ses 


and . =0 ife=a’. 
Hence 


1 
cov(xy 4) =F (Xe—H)(Xar—p) 
b žij el, H N(N—1) 


= TEU Mew) (Xen) 
= yw E-Z) g- p)?} 


o? 


1 
“MNA ET 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 421 


Thus, in this case we have 


B(3) =! muy we (15.9) 
and var(z) = xnot+ Axan) x (-5a) 
n=l 
=2(1- =). we (15.10) 
Hence the standard error of # is 
fe a-l 
a= Sy! ca ve (15.11) 


In both cases, the standard error decreases with increasing n. 
The standard error of the mean in sampling without replacements is, 
however, smaller than that in sampling with replacements. But the 
difference becomes negligible if V is very large compared to n. Also, 
in sampling without replacements, the standard error of the sample 
mean vanishes if n=, which is to be expected because the sample 
mean now becomes a constant, i.e. the same as the population mean. 
However, this is not the case with sampling with replacements. 

\ 
15.4 Expectation and standard error of sample proportion 

Suppose in a population of V numbers, there are Np members 
with a particular character A and Ng members with the character 
not-A. Then p is the proportion of members in the population 
having the character A. Let a sample of size n be drawn from the 
population, and let f be the number of members in the sample 
having the character A. To find the expectation and the standard 
error of the sample proportion fn, we adopt the following 
procedure. 

We assign to the ath member of the population the yalue X,, 
which is equal to | if this member possesses the character A and 
equal to 0 otherwise. Similarly, to the ith member of the sample we 
assign the value x; which is equal to 1 if this member possesses A 
and is equal to 0 otherwise. 

In this way, we get a variable x, which has population mean 


1 = 
Ween 3 


422 FUNDAMENTALS OF STATISTIOS 
and population variance 
Sy ee RE E 
VW LX f=p—p*=pq 
The sample mean of the variable x, on the other hand, is 
le of 
: ng n : 
Hence we find, on replacing z by f/n, » by pand o? by pq in the 
expressions (15.6), (15.8), (15.9) and (15.11), that 


E(f|n)=p ve (15.12) 
and opm =4/ + (15.13) 
in the case of random sampling with replacements, and 

E(f|n)=p se (15.14) 
and orn =a 81e vs (15.18) 


in the case of random sampling without replacements. : 
The comments made in connection with the standard error of the 
mean apply here also, 


15.5 Random Sampling from a probability distribution 

We have defined random sampling in the context of a finite 
Population, But there will be many cases where the population has 
to be considered infinite and hypothetical. 


Example 15.1 One may be interested in knowing whether a given 
coin is unbiassed, i.e. has the same probability, when thrown, for 
falling heads as for falling tails. The population then comprises 
the whole set of throws that may be made with the coin. This is 
clearly infinite and hypothetical, 


Example 15.2 Suppose one is interested in the average life (in, 
say, hours) of a type of bulb produced by a bulb factory. Here the 
population to which the average relates includes not only the bulbs, 
of the given type, that the factory has in stock, but also all bulbs 
that it produced in the past and sold out and all those that it may 
Produce in the future, besides all those bulbs that it might have 
produced but did not. Here too the Population is infinite and 
hypothetical, 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 423 


When the population is of this type, we shall define random 
sampling in somewhat abstract terms. To indicate how this defini- 
tion is motivated, consider a population of size X where x has the 
following frequency distribution : 


Value of x Frequency 


fk Se 


Total N 


Let x1, Xg ++.) x, be the values of x for the n members of the 
sample. For any i, we have, under both SRSWR and SRSWOR, 


Pla=ls]—=felN, 
so that the marginal probability distribution of x;, for any i, is the 
same as the distribution of x in the population. 
Again, for i#j, we have, under SRSWR, 


Plai= Eg t=] =} re J 
while under SRSWOR, 


Jelfe—!) sp pp 
Not) a RRE 
Pli=€p 47=E p= 


(VED WAAR 


Consequently, any two (or more) of the sample observations are 
independently distributed under SRSWR and, for large N, approxi- 
mately independently distributed under SRSWOR. 

Taking these as essential features of random sampling, we 
characterise an infinite population in terms of the probability distri- 
bution of some random variable x (which may as well be a vector). 
By a random sample of size n from this population (or from this pro- 
bability distribution) we mean a set of n random variables, say 


424 FUNDAMENTALS OF STATISTICS 


yy gy sisi Xq, SUCH thaty x, xg.: , ž„ are independently distri- 
buted and each has the same (marginal) probability distribution as 
the probability distribution of +. 

Hence (a)'in the discrete case, if f(x) is the probability-mass 
function of x, then the joint probability-mass function of x, x2 e... Xn 
will be supposed to be 


I f(x). 
Similarly, (b) in the continuous case, if f(x) be the probability- 
density function of x, then the joint probability-density function of 
Fis Hes sere , ¥„ Will be supposed to be 


Åre. 


15.6 Sampling distributions associated with discrete popula- 
tion distributions 

We shall here and in the next section derive some common 
sampling distributions that arise from random sampling from an 
infinite population. 

(a) Sampling distribution of sample total : binomial parent 

Suppose x, and X, are distributed independently in the binomial 
form with parameters m, p and m, f, respectively. Consider 
then the distribution of the sum %,+-x,. Obviously, the values 
this sum can take are 0, 1, 2, ...... >My +In,, 

Also, 


Pity +%=h=¥ Pls,=k,] Pirk] 
ky=o 
E k m ma "TEE mtm-k 
2, fr) (. 4.) 0 (1—2) 
=t (1—p)"1t "t $ /m\/ m k 
p*(1—p) 2, (i) as 
Now, this sum is nothing but the sum of products of the coeffi- 
cients of 1“ in (14+1)™! and of t*~*1 in (144)"8, for varying k and 
hence equals the coefficient of ¢* in (1-+1)"!*"®, which is ‘ag hae, : 
Thus 
Plata k=(™E™) pt (1p int, 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 425 


This shows that x,-+-x, is itself binomially distributed with 
parameters m,-+m, and p. We also get from this the general result 
that- if ‘yr xg cde , x, are independently distributed binomial 
variables with parameters m,, p ; mg, Í 3 et 3 mp p, then the sum 
yt Xe evens +x, is also a binomial variable with parameters m,+- 
r +m, and f. 

This implies that if x,, xp ...... , x, are a random sample from 
a binomial distribution with parameters m and p, then, the sampling 
distribution of the statistic x+ xti +x, is also binomial with 
parameters nm and p. 

(b) Sampling distribution of sample total : Poisson parent 

Suppose x, and x, are distributed independently in the Poisson 
form with parameters A, and À, respectively. The sum *,+*p 
can then take the values 0, 1, 2, s... . Also, 

P[x Hx =k]= $ Pix =k ]P[x=k—k:] 
Ain 
<3 exp[—AJAr Sny" Ory SPL age 
ihe 1! eS ay oh 
—exP[—A1—Aa] po à] ¥ > (ia ey 


k,=0 
mexp[= aye”, 


which shows that x,+., is itself a Poisson variable with parameter 
A, +A,- It immediately follows that if xı xp, +++ , x, are indepen- 
dently distributed Poisson variables with parameters A,, Àp +-++--) Ans 
then the sum x,-+x%_+------ --x, is also a Poisson variable with 
parameter A,+-Ag-+--+ +++ +Ane 

The above result gives, in particular, the sampling distribution of the 
Statistic xy4 Xy+ -iH Xn When 4, Xps +++ , X, are a random sample 
from a Poisson distribution with parameter À) This sampling 
distribution is also of the Poisson form with parameter nÀ. 


15.7 Four fundamental distributions derived from the normal 
(a) Distribution of standard normal variable 
The definition of a standard normal variable has already been 
given in Chapter 10; it is a normal variable with mean zero and 


426 FUNDAMENTALS OF STATISTIOS 


standard deviation unity. Thus the probability-density function 
of the standard normal distribution is 


f(r) = gapi], (15.16) 


where —o <7 < œ. 
The properties of this distribution may be deduced from those of 
a general normal distribution. 
We shall denote by 7, the value of 7 such that 
P[t>7,.] =a. se ATSI 
It is called the upper a-point (or the upper 100a%-point) of a 
standard normal variable. Because of the symmetry of the distribu- 
tion about zero, we have 
Tjaa = — Ta: 
Thus the lower «-point, 7,_, of a standard normal variable— 
which is the value of 7 such that 
Pir<7,_,]=¢« Seat) OLS) 
—is the same as the upper a-point in magnitude but has the 
opposite sign, s 
Fig. 15.1 shows the curve of this distribution. 


° 
è 


PROBABILITY - DENSITY 
Q ° 
= eo 
o 12 t yo 


peo = 3. 2 =4 o 1 2 * ieee d 


Fig. 15.1 Distribution of a standard normal variable. 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 427 


It follows from the theorem below that if x is normally distri- 
buted with mean u and variance o*, then (x—,)/u is a standard 
normal variable. Conversely, if (x—p)/o is a standard normal 
variable, then x is a normal variable with mean u and variance g? 

Theorem 15.1 If x is normally distributed with mean p and 
variance oì, then y=a-+ bx, where b40, is also normally distributed 
with mean a+ dp and variance 6°9*, 

Proof: Let us denote the p.d.fis of x and y by /(x) and g( y). 
respectively. 

Assuming b > 0, from the result 


Ple <y < d]=P|7" <x sgis], 
we have i 
(d-a)jb 
jeo- [fo)e= [ie ee 
teed 


(on making the transformation y=a+éx). If b<0, we similarly 
have, from 


Pie <y < d]= te" exes 5‘) 
Í sob =- RA z 
Combining the uA results, we get* 
80) =f Cy zla 
But =l I=: =; and fle) sml- (#— u)?/20]. 


Hence g(y)= Beva exp = Be) i I 2°] 


= HEV =exp[— aa bp)?/2070?), 
which proves the theorem. 


* This indicates why in each case one is to take in the p.d f. of the new ; variables 
|J| rather than J, the Jacobian, itself. 


428 FUNDAMENTALS OF STATISTIOS 


(b) x? distribution 


Let Jis Pas eree , jy be v mutually independent standard normal 
variables, Then the sum of their squares 
Sy? 


i=1 
is called a X? (chi-square) with y degrees of freedom (or with df=v). 
It has the probability-density function 


A = pore eRe, we (15.19) 
where 0<,*<oo. 
The p.df. of x, the positive square-root of x?, is immediately 
found to be 


monr eR, ve (15.19a) 
where 0<xX<0. 

For v<2 the density (15.19) steadily decreases as x? increases, 
while for y>2 there is a unique maximum at x’=v—2. The distri- 
bution is thus always positively skew. The curve of the distribution 
of x? with df=7 is shown in Fig. 15.2. 


FROSAPICITY. - DENSITY 


« x = 
Ci 2 6 A t E a ha S S. aap 
Fig. 15.2 X? distribution with 7 degrees of freedom (v=7). 

Consider, to begin with, the distribution of the square of a single 

standard normal variable, say 
| Z=. 

z varies from 0 to oo and for 0<a<b< oo, we have, noting that the 
transformation from y to z is two;to-one, 


Pla<z<b]=P[Va<y<V5]+P[—Vi<y<—Va]. 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 429 


Thus if g(z) be the p.d.f. ct and f(y) that of y, then 


-y3 


b 
J s@ae= k Kit f FOD 


b -vb 


-f [vagi Vz)—/(— Vaa(— Va) |ë 


(on putting y=vV z and y= — Vz in the first and the second integrals, 
respectively). 


Hence 
eintstva +f(—vail? f 


=2x Vi exp[—2/2] x s—- 
= -11 p4 
mra er i 
Thus the result (15.19) is seen to be true for y=1. If it is then 
Rt aes 
assumed to be true for »=t, the p.d.f. of u=4/ >»? is, from 
1 
(15.19a), : 
1 
Ey C77) PA, 0<u<o, 


and that of »=V 9,42 is, from (15.19a) again, 


rame), 0<v<0. 


The joint p.d.f, of u and v is, then, 
1 
ISAT MENPAN iti Ze ica 
Now make the one-to-one polar transformation : 


=u' cos 6, } 
v=u' sin 


Then w'=Vepa (=). 


Also, the Jacobian of the tranformation is 


(0<u' <o, 0<0 <x/2) 


ðu Ou 
au’ ĝð R 
w pT 


430 FUNDAMENTALS OF STATISTIOS 


Hence the joint p.d.f. of w and @ is 
1 F 
zes 2ra) exp[—u'?/2](u 
O<u' <0, 0<0<7/2. 


"f cost=10, 


wid 


Since 2 f cos'-19d0—B(I/2, t/2), 
à 


the p.df. of u’ is 


1 
r exp[ —u’?/2](u’)' 
gener Ely 3 
whence the p.d.f. of u’? comes out to be 


i — l expusi, 

ou t-+1 

gute r$) 

The result (15.19) thus holds for v=t-+1 if it is assumed to hold 
for »=t, Since it has been already shown to be valid for y=1, by 
mathematical induction it is found to hold for all positive integral 
values of v. 

An important result regarding the x? distribution is to be noted. 
Let Y, and Y, be two independent variables distributed as x*s with 
df equal to v, and vp, respectively. Then the sum Y,-+ Y, may be 
shown to be distributed in the same form with df=v,+v,. This may 
be regarded as a consequence of the definition of x’, since Y,+-Y, is 
the sum of squares of v,+v, mutually independent standard normal 
variables. A direct proof may be given by considering the joint 
distribution of Y, and Y, and deriving from it the joint distribution 
of Y=Y,+ Y, and 9, which are such that 

VY,=VY cos@, 
VY,=VY sind 
(0< Y <%,0< 0< 72). 
This property is designated as the additive property of x?s. 

For large », V/2x* can be shown to be approximately normally 
distributed with mean ¥2y—] and standard deviation 1. This 
approximation is generally used to calculate values of x? at different 
probability levels for y > 30. 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 431 


We shall denote by x3, , the value of x? (with df=») for which 
P(x? > x2, ,] =a. + (15.20) 


X2,y is then the upper a-point of the x? distribution with df=v, 
while the lower a-point is x? 


—G,Vv° 
(c) £ distribution 


If y be a štandard normal variable and Y a chi-square with df=v, 
distributed independently of y, then the new variable 


wa 
VY/v 
is called a ¢ with v degrees of freedom. It has the distribution 
1 hivta" 
= ž ny 21 
OE T TUAS spay (t+ ) i (15.21) 


where —w<i<o. 
Let us start from the joint density function of y and Y, which is 
1 
are Yv-202 —y2/2 —Y/2 
VETERE exp[—y*/2]exp[— ¥/2], 
—o<y<0,0<¥<ou, 
Making the one-to-one transformation 
lanan 
VY; (0 <t< 0, 0<u<o), 
u=Y 
so that y=tV ulv, Y =u, 
and noting that 


ay ð | ry ae 

a du Ad or 
J= = =Vulv, 

aY ƏY o 1 

Ot Ou 


we have the joint p.d.f. of ¢ and u as 


menno g] 


0 <$ < 0,0 <u <0. 


432 FUNDAMENTALS OF STATISTIOS 


The p.d.f. of t is, therefore, 
! % (w=1)/2 iru e | 
rai e118 VP (v2) aiia agh) |e 


r4 (+ rane 


ži TUEVETN pi 
1 Si =(vt1)/2 
San B(1/2, 1 \! +5) 

Like the standard normal distribution, the ¢ distribution is - 
symmetrical about ‘=0. But unlike the normal distribution, it has 
ya>0; i.e., it is more peaked than-a normal distribution with the 
same standard deviation. 

The symbol t,,, will be used to denote the value of ¢ (with df= ) 
such that 


fit > te yl =e vee E 
Owing to the symmetry of the distribution, 
bay =—ba,y- ave (15.23) 


For small y, the ¢ distribution differs considerably from the 
standard normal. distribution, ¢,,, being always greater than 74 
if0<a<1/2. For large values of v, however, the ¢ distribution 
tends to the standard normal form and t., may then -be well 
approximated by 7a. 


04 


PROBABILITY - OENSITY 


Ow “4 be 3 o 2 á — o 
Fig. 15.3 1 distribution with 5 degrees of freedom (»=5). 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 433 


(d) F distribution 
Let Y, and Y, be independently distributed as X’s with », and 
v, degrees of freedom, respectively. The random variable 
Ylvi 
Yo/ve 
is then called an F with (vı, ve) degrees of freedom. This has the 
distribution with p.d.f. 
(valve) ai (vı-2)/2 l py eza ATA K 
AP) = e r (1+2 : (15.24) 
hare 0<F< om. 
To derive this result, note that the joint p.d.f. of Y, and F, is, 
from (15.19), 
l 3 x i 
g(Y, aOR Ns, OE RE RETR = y," SAE uj» 
aral r(y,/2)T(r4/2) 
—Yı+Y; 
xexp[ ——5-4}, 
0< Y, <%,0< ¥;, <0. 
Let us now make the one-to-one transformation 


F= dales 


Yi l} (0< F< œ, 0< u< 0), 
«=F; 
so that 
Y, =" Fu 
Ve 
and Y,=u. 
The Jacobian of the transformation is 
aY, ay, v 
ler du eA 
aY, 3Y, ra 
ha et 0 1 
OF du 


Hence the joint p.d.f. of F and u is 
ACF sae 1 (i Fa) Caray arsdie 
r dR 76/2) T o2) 


solsin 


¥9(1)—28 


434 FUNDAMENTALS OF STATISTICS 


we (eal)! porah Pm) 
2D 7 (5,2) F(r4/2) 
u y, 
xexp|—5(1+7") | 
, O<F<0,0<u<o. 
The p.d.f. of F is, therefore, 


f(E)= fue, u)du 
ô 


vih rfg” 
a (vilva) * xF?) x 


~ FO /2)T 12) (+r) (vitvs)/2 


O E eS ee 
~~ B(%4]2; v[2) T rA Ce 
vg 


0<F<o. 


This distribution is highly positively skew. It is easily seen from 
the definitions of ¢ and F that an F with »,=l is a #*, t having 
df=v,. 

As in the previous cases, we shall denote by F, ; y, y, the upper 
a-point of the F distribution with df=(v,, vg) ; i.e., 


P[F>Fa; vv] =e- se. (15.25) 


0-4 


o2 


PROBABILITY - DENSITY 


o 


o 10 20 SOTRA, aae D 


Fig. 15.4 F distribution with (10, 4) degrees of freedom 
(v= 10, ve=4). 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 435 
As regards the lower a-point Fy_. ; y,, yg» We see that 


P[F<Fj-2,; vy vgl=e 


l i 
or > — | =a. 
ler A, an z 
Now L, which is of the form 72/¥, is itself distributed as an F 
F Y;/y; 
with df=(v,, n). It follows that 
ee 
Fries vvs setra 
. 1 
or Fie Yue (Roo eee (15,26) 


SEV gr Vy 


It is, therefore, unnecessary to tabulate the lower a-points of F distri- 
butions with various df’s, once the upper «-points are tabulated. 


15.8 Sampling distributions of mean and variance in random 
sampling from a normal distribution 
Let zi tpi > ža be a random sample from a normal distribu- 
tion whose p.d.f. is 
Se=- 
—0 <4 <0. 


We shall denote the sample mean and the sample variance of x by # 
and s’?, respectively. Thus 


I= Raila 
and s= Bla)" 


In order to obtain the sampling distributions of % and s’*, we 
start from the joint p.d.f. of x,, x9, «+++ » Xas Which is 


TEs 6) = yyy sttPl— Blew P20 


We make the following one-to-one transformation from x; (i= 1, 2, 


436. FUNDAMENTALS OF STATISTIOS 


Sones s n) to y; (i=1, 2, .....-, n): 
ype Mot (a)l tnnt (4 —p)/o 
Vn : 
(15.27) 


ET n toin E5), 


for i=2, 3, ...... yn, 
where the (n—1) vectors (a;i aja) ...... sin) are of unit length, 
mutually orthogonal and each orthogonal to the vector 


Le 1 
One such set of vectors is : 


(1, ~T, 8; 0} ......):0, 0), 


J 
2 
Bh 1, —2, 0, u E300) 


Oe Os Os 
Ox, Ox, Ox, 


a 
1_| 2 Br, plexi, vee (15,28) 


Pi Pam ax ox, OX» a 
DA aye Oye 
ee a ax, 
implying that J= Fo" and |J| =o". 
Further, 
n e " SE ot. 
Ža = Èn) 
Hence the joint p.d.f. of y,, Jas ....-- Ja is 
1 -5y 
(va Pl 2 12). 


This shows that yy Ya, .-.-.-)%. are independently and identically 
distributed, cach being a standard normal variable, 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 437 
But 


n= lH) = 


a linear function of z. Since y, is a standard normal variable, # must 


Vn(z—) 
o > 


be a normal variable with mean p and variance 7 (from Theorem 
n 
15.1). Thus the p.d.f. of z is 


pas exp[—n(z—p)*/2o%), 


=o <¥< 00, .. (15,29) 


g(%)= 


Again, 
Z 2 = 2 2 
2! = 2 =J 
= Af $u) 
S fp) — p 22l)” 
PAS ®) ee ws (15.30) 
Now È J, being the sum of squares of (n—1) independent 
=1 
standard normal variables, is a x? with df=(n—1), and this is 
distributed depannsnty of y It follows that (n=1)s?/o? is 


distributed as a y? with df=(n—1) and is independent of. From 
(15.19) the p.d.f. of s'è is, therefore, obtained as 


ie) exp[—(n—1)s‘2/20%] (s'2)("- 30/2, 
i (2o8)("—10/8 r=) 


OLIE my «a» (15,81) 


Questions and exercises 


15.1 Define simple random sampling. Describe some practical 
methods of drawing a random sample from a finite population. 

15.2 Explain the terms ‘parameter’, ‘statistic’, ‘sampling distri- 
bution’ and ‘standard error of a statistic’. 

15.3 Obtain the expectatlon and standard error of sample mean 
for a random sample of size n drawn from a population of size M 
(a) with replacements, (b) without replacements. 


438 FUNDAMENTALS OF STATISTICS 


154 Show that fora random sample of size 100, drawn with 
replacements, the standard error of sample proportion cannot 
exceed 0:05, 

15.5 Suppose that a statistic T is normally distributed with mean 
6.. Show that it is very unlikely that the percentage error in estima- 
ting @ by T will be greater than 3 times the coefficient of variation 
of T. [Note that in this case the coefficient of variation is 100c¢,/9.] 

15.6 Identify the distributions of xê, t and F as members of the 
Pearsonian family. 

15.7 Starting from the density function of a standard normal 
variable, obtain that of a x? with df=v. 

15.8 Starting from the definition of the t-statistic with df=v, 
obtain its density function. 

15.9 Define an F with df=(v;, ve). Hence obtain its probability 
density function. 

1510 Show that z and s’? for random samples of size n from a 

normal population are distributed independently of each other. 
Also, show that = is distributed normally with mean p and variance 


o*/n, while eye is distributed as a y? with df=(n—1). 
15.11 Proceeding as in Section 15.8, show that if x; (i=1, 2, -..--- Š 
n) are distributed independently and normally with means p; (i=1, 


Are , n) and variances o;* (i=1, 2,...... , n), then the linear function 
a+ Zbix (where at least one b is non-zero) is normally distributed 


with mean a+ > b; p; and variance Z brat. 
T 
15.12 Ifx has the exponential distribution with p.d.f. 
S(x)=Oexp(—Ox), O<x<00, 
where 6 > 0, then what is the distribution of F#o x, (ial, 2,.....,. »n) 


being a random sample from this distribution ? 
15.13 If x is distributed in the rectangular form with p.d.f. 


sy 0<1<0, 


show that —2In(x/@) is a x* with df=2. 


RANDOM SAMPLING AND SAMPLING DISTRIBUTIONS 439 


15.14 If Y, and Y, are independent x’s with df=v, and vy find 
the joint distribution of Y,+Y¥, and Y,/Y,. Show that Y,+ Y, is 
itself a x2, while Y,/Y, is of the form ™F, and that these two are also 
mutually independent, ‘. 

[Hint : Make the polar transformation : 

Vv Y= vV Ycosð, 
VY, =VYsinð. 
Note that Y=Y,+ Y, and cot?ð=F,/ Y+) 

15.15 Denoting by X, a Poisson variable with parameter A and 
by yf a x? variable with df=k, establish the following identity, for 
all positive integers k : 

P[X, <k—1] =P(x}, >2A)- 

15.16 Denoting by X,,, a binomial variable with parameters 

(n, p) and by Fy ,y, an F statistic with », and v, degrees of freedom, 


establish the folowing identity : 


—k+1 
PIX, s <k—M =| Fat, ate xP): 
15.17 -Let xi =l, Z; e.s- ,n) be a random sample from a 


continuous distribution with p.d.f. f(x). If yand z be the smallest 
and the largest of the observations, show that y has the poe 


of frod |» 


J 
while z has the p.d.f. 


af Í fdas] so 


-0 

15.18 (Continuation) Show that in random sampling from the 
exponential distribution f(x) of Exercise 15 12, y and z also have 
exponential marginal distributions. 

15.19 Suppose xy xg e s Xen are independently distributed 
normal variables with common mean p and common variance o*. 
Find the distributions of, 

(a) [ttrt HXn— Xni — Xna eee — ga] |V On, 
(b) (%)—¥ng1)? + (%a— Ange)? tee + (Xn — xin)” 


440 FUNDAMENTALS OF STATISTIOS 


15.20 Let x,, xg, xa and x, be independently distributed standard 
normal variables. Obtain, by using the tables in Appendix A, 
the probabilities 

(a) Pl[x,+2x_+ 3x, > 65), 

(b) P[x,—2x_+3x, < 8], 

(c) P[x,-+-x,? > 7°815—x,*], 
(d)  P[2x,? < (4°303)?[x,°++,"]], 
(e) P[x,2—19x,." < 19x — x]. 


SUGGESTED READING 


[1] Goon, A. M., Gupta, M. K. and Dasgupta, B. An Outline of 
Statistical Theory, Vol. I (Ch. 10). World Press, 1977. 

[2] Hogg, R. V. and Craig, A. I. Introduction to Mathematical 
Statistics (Chs. 3, 4). Macmillan, 1970, and Amerind, 1972. 

` [3] Keeping, E. S, Introduction to Statistical Inference (Ch. 8). Van 
Nostrand, 1962, and Affiliated East-West Press. 

[4] Mood, A. M., Graybill, F. A. and Boes, D.C. Introduction to 
the Theory of Statistics (Chs. 5,6). McGraw-Hill, 1974, and 
Kégakusha. 

[5] Rao, C. R. Advanced Statistical Methods in Biometric Research 
(Ch, 2). John Wiley, 1952. 

[6] Yule, G. U. and Kendall, M. G. An Introduction to the Theory of 
Statistics (Ch. 14), Charles Griffin, 1953. 


1 6 BASIC PRINCIPLES OF 
STATISTICAL INFERENCE 


16.1 Estimation and testing of hypotheses 

Since a sample is but a part ofa population, the features of the 
former will generally differ from those of the latter. The question 
that naturally arises is then : what can be said about the properties 
of the population from a knowledge of the properties of the sample ? 
Although an answer to this question may not be found in all cases, 
in the case of random sampling it can be answered with the help of 
probability theory. In sampling theory, we are primarily concerned 
with this very question. The process of going from the known sample 
to the unknown population has been called statistical inference. 

The basic problem of sampling theory usually presents itself in 
one of two forms: (a) Some feature of the population in which an 
enquirer is interested may be completely unknown to him, and he 
may want to make a guess about this feature completely on the basis 
of a random sample from the population. (b) Some information 
as to the feature of the population may be available to the enquirer, 
and he may want to see whether the information is tenable in the 
light of the random sample taken from the population. The first 
type of problem is the problem of estimation and the second the 
problem of testing of hypotheses. 

We shall assume in this chapter and in Chapter 17 that th. form 
of the population distribution (binomial, normal, etc.) is either 
known or is not of importance to the enquirer in the particular 
context, in which case he will be interestd in some unknown para- 
meter or parameters of the distribution. The usual problem is then 
to estimate the unknown parameters or to test some hypotheses 
regarding these parameters on the basis of the given sample. In 
later chapters, we shall also consider the problem of testing hypo- 
theses regarding the form of a population distribution. 


16.2 Point estimation of parameters 
Let @ be an unknown parameter of the distribution of a variable 
x. For estimating ô on the basis of a random sample, xy, Xg see.. Ris 
441 


442. FUNDAMENTALA OF STATISTICS 


we may use a particular statistic T. Then 7’ is the estimator of 0, and 
the value of 7 obtained from a given sample is its estimate. Clearly, 
for T to be a good estimator, the difference |7'—6| should be as small 
‘as possible. However, since T is itself a random variable, all that 
we can hope to ensure is that the difference be -mall with a high 
probability. 
Unbiassedness and minimum variance 
One way of achieving this would be to see that the sampling 
distribution of T has a central tendency towards @ and a small 
dispersion. If we agreed to accept the mean as the proper measure 
of central tendency and the variance as the proper measure of 
dispersion, then we would want that T should be unbiassed*, i.e., 
E(T')=6, whatever the true value of 
6 may be, vee (651) 
and that, among all unbiassed estimators, 7’ should have the smallest 
variance, i.e. 
var(7') <var(7’), whatever the true value of 
; 0 may be, 16:2) 
where T is any other unbiassed estimator. 
A statistic 7 of this type is called a minimum-variance unbiassed 
estimator of 8, 


Example 16.1 It has been shown (in Section 15.4) that if we 
consider Bernoullian trials with probability of success p for each 
trial and if f be the number of successes in n such trials, then 

E(f|n)=p, 
whatever the true value of p. Hence ffn is an unbiassed estimator 
ofp. Itcan also be shown that, among all unbiassed estimators, 
fin has the smallest variance. Hence f/n is a minimum-variance 
unbiassed estimator of p. 


Example 16.2 If x4, Xy o ,x, be a random sample from a 
population with mean p and if # be the sample mean, then 
E(2)=p, 


whatever the true value of p, so that ¥ is an unbiassed estimator of p 
(vide Section 15.3). 


*If (7) =8-+(8), then 6(@) is the bias of T. 


BASIO PRINOIPLE3 OF STATISTIOAL INFEBENOE 443 


Suppose further that the observations are not only random but 
also independent and that the population is normal with mean p 
and variance o%. Here it can be shown that z has the least variance 
among all unbiassed estimators ofp; ice, zisa minimum-yariance 
unbiassed estimator of p. 


Consistency and efficiency 
An alternative approach would be to demand that the estimator 
should behave more and more satisfactorily as the sample size n 
becomes larger and larger, In particular, it may be required that 
the values of 7, which purports to be a good estimator, should be 
more and more clustered around 8 with increasing sample size. To 
put it in probabilistic terms, it may be required that the statistic T 
should converge stochastically (or in probability) to @ as n>%. In other 
words, given two positive quantities, € and 7, however small, it 
should be possible to find an ng, depending on e and 7, such that 
Pi |T—8\<>1—9 sean (16:3) 
whenever n>n, A statistic T with this property is called a consis- 
tent estimator of 0. : 
It can be shown that a set of sufficient conditions for T to be 
consistent are that 
E(T)->0 z. (16. a) 
and var (7')-+0 .. (16.4b) 


as nro. 
There may be found a large number of consistent estimators for 6. 


f i eg, THS jain Ss 
Indeed, if T be consistent, so are, ©.8-, + Hn) and 7 { + Jar 


where a is any constant independent ofn and ẹ(n) is any increasing 
function of n. A 
To choose among these rival estimators, som additional criterion 
would be needed. Thus we may consider, tog:ther with stochastic 
convergence, the rate of stochastic convergence; i.e , we may demand 
not only that 7 should converges stochastically to @ but that it 
should do so suffi:iently rapidly. We shall conne our attention to 
consistent estimators that are asymptotically normally distributed 
(vide Chapter 19). In that case, the rapidity of convergence will be 
indicated by the inverse of the variance of the asymptotic distri- 


444 FUNDAMENTALS OF STATISTICS 


bution. Denoting the asymptotic variance by ‘avar’, we may then 
say that T is the best estimator of 8 if it is consistent and normally 
distributed and if 
avar (T) <avar(7"), 

whatever the other consistent and asymptotically normal estimator 
T' may be. 

A consistent, asymptotically normal statistic T having this 
property is called efficient. 


Example 16.3 Consider the proportion of successes, f/n, for a set 
of n Bernoullian trials with probability of success p. From Corollary 
3.15.2, it follows that f/n is a consistent estimator of p. 

Further, the fact that f has the binomial distribution, with para- 
meters n and p, means that f is asymptotically normally distributed 
with means np and variance np(1—p). Hence fjn is seen to be 
asymptotically normally distributed with expectation p and variance 
p(l1—p)/n. Since this can also be shown to be the smallest asymp- 
totic variance for an asymptotically normally distributed consistent 
estimator of p, f/n is also efficient. 


Example 16.4 Let x1, x9) +--+ , x, be independent random obser- 
vations from a normal distribution with mean p and variance oè. If 
o? is finite, it follows from Corollary 3.15.1 that the sample mean 3 is a 
consistent estimator of p. 

Now the sampling distribution of g is exactly normal with mean p 
and (exact) variance o%/n. Also, o*/n can be shown to be the 
smallest asymptotic variance for an asymptotically normally distri- 
buted estimator of p. Hence is also an efficient estimator of p. 

On the contrary, the sample median x„; has* 


E(X_j) œp w+ (16.5a) 

and à var (Xai) Z mo?j?n ees (16.5b) 
Since E(x,;)—>~ and var (x,,;)-»0 as n-o, the sample median is 

a consistent estimator of p., Like ¥, x,y is asymptotically normal. 


But since x,, has asymptotic variance o*/2n, which is greater than 
o*/n, it is an inefficient estimator. 


*The symbol ‘~~’ denotes asymptotic equality. 


BASIO PRINOIPLES OF STATISTIOAL INFERENCE 445 


Sufficiency 

The criteria of consistency and efficiency for a good estimator 
have been suggested by R. A. Fisher. 

Now, a preliminary choice among statistics for the purpose of 
estimating 6, before looking for a minimum-variance unbiassed or an 
efficient consistent estimator, can be made on the basis of another 
criterion suggested by Fisher. This is the criterion of sufficiency. 
A statistic ‘7 is called sufficient for 8 (or, rather, for the family of 
distributions characterised by @) if the conditional distribution of 
any other statistic for given 7’ is independent of @. Obviously, if 7 
is of this type, then any inference regarding 0 can be made on the 
basis of T alone, instead of starting with all n observations, In 
other words, 7’ provides a method of summarising the information 
regarding 0 contained in the whole sample into a single statistic. 

A necessary and sufficient condition for T' to be sufficient for @ is 
that the joint probability-density function or the joint probability- 
mass function of x, xg, «ee. , *, should be of the form 


S (R13 Bay ooo 3 ¥q|0)=g(T | O)h(xy, xg, «ee PE it eave L6G) 


where the first part of the right-hand side depends on T and 8, while 
the second part is independent of 8. This provides a simple method 
of judging whether 7’ is really a sufficient statistic. 


Example 16.5 Consider a set of n Bernoullian trials with proba- 
bility of success p. With the ith trial we may associate a variable x; 
having the probability-mass function 


S (xi p) =p74(L—p)-"4, for x=0, 1. 
The joint probability-mass function of x,, x3, ...... Ae 
FAC na » Xn] p) =p! (1—p)*-F 
= gi f |b) A xay Xar «00009 Sa) 
where f= g(s | p)=p/(l—p)*-/ 
and (X45 Xas +e sXa)=l. 


Hence f, the number of successes in the n trials taken together, is 
a sufficient statistic for p. So is p=f jn. 


446 FUNDAMENTALS OF STATISTIOS 


Example 16.6 Let xy že +++ , x, be as in Example 16.4. Suppose 
further that p is unknown but o? is known. Then the joint density 
function of x,, %g) «+--+ + Xq 38 


f (Xis Xp verve ža |p)= gra aze] 


=F |p) hlt to «eee > Xn)» Say. 
Hence z, as well as n¥= 5x; is a sufficient statistic for p. 
i 


16.3 Maximum-likelihood estimation 

There is a simple method of obtaining estimators with desirable 
properties. This is the method of maximum likelihood. 

Consider (f(x, xg) -+.é++2 *2|0), the joint probability-density or 
probability-mass of the sample observations, For fixed 0, it may be 
looked upon as a function of the sample observations and then it 
gives their probability-density function or probability-mass function. 
But, when x,, Xes =e , x, are given, it may also be looked upon as a 
function of 6, called the likelihood function of 0 and denoted by L(9). 
The principle of maximum likelihood consists in taking that value 
as the estimate of @ for which L(8) is a maximum. Thus if Ô be the 
maximumrlikelihood estimate of 0, then by definition 


L(0)=maxL (6). te” (16,7) 
(The maximum likelihood estimator is 6 when looked upon as a 
function of the random variables xy, Xas «+++ ye 


ra 
In many cases, it will be convenient to deal with In L(@), rather 
than L(6), and since In L(6) attains its highest value for the same 

value of ð as L(@) does, 6 is such that 
In L())=max|n L(8). ws (16.8) 


This 6, again, will in many cases be obtainable by differentiating 
In L(0), i.e. will be that value of @ for which 
dinL(é 
Anke) =0. (16.9) 
But one must make sure that the value obtained by solving (16.9), 
which gives a local maximum of L(8), also gives the absolute (global) 


BASIO PRINOIPLES OF STATISTICAL INFERENCE 447 


maximum. Indeed, the derivative may not exist at 6=6, and then 
this method will fail. 

Example 16.7 With xy, Xg ++. , x, the same as in Example 16.5, 
let us obtain the maximum-likelihood estimate of p: Here the likeli- 
hood function is . 

: L(p) m =p) 
and InL(p)= (x) prea ins In(1—f). 


Hence 


din L(p) _ (1—p) Er —P(n— Yi) R ois 
Pdu. p(1—p) ~ pp)’ 
which equals zero for p= >x;/n, provided 0< Za<n. When y=, 


it is directly found that In L( p) or L(p) is the highest for the smallest 
value of p, i.e. for p=0, Similarly, when yuan, L( p) is the highest 


for the highest value of p, i.e. for p=1. Hence, whatever the value 
of Fro the maximum-likelihood estimate is_- 
$= Zaln=fln 
the sample proportion of successes. 
Example 168 Let xy, ža +++ sX, bea random sample of inde- 
pendent observations from a Poisson distribution with parameter A. 
Then the likelihood function is 


and In L(A) =—mA+ (Za) InA— 3 lo(%1). 
Hence í : 
din L(A) __ 1 
E ant ay 
which equals. zero if, and only if, A= Zain, provided Z> 0, In 
case Şx;=0, we find directly that In L(A) or L(A) becomes a maxi- 


mum when A takes its least possible value, i.e. when A=0. Thus, 
whatever the value of > the maximum-likelihood estimate of À is 


i= Baum, 
the sample mean. 


448 FUNDAMENTALS OF STATIS‘IOS 


Example 16.9 Suppose xi, Xa +++ , X, are a random sample from 
à normal distribution with mean p and variance o°. 

Casel: p unknown, ¢ known (=a9) 

Here 


L= Jr Cry Pl peAa] 
Proceeding as in Example 16.7 and Example 16.8, we have 
a=z. 


Case 2: p known (=po), o unknown 
Here 


L(e) “avis —exp[— =F (ti — m0)! a?), 
and, proceeding as before, we get 
sa V X(x — po)" n 


_ Case3: Both p and o unknown 
Since both parameters are unknown, the likelihood function 
here is 


L(y, 0)=—+_—exp[— (+: #)*/20")- 


(ov 2n)" 
Now, 
aija 
In L(u, 0)=—3In(27)—nlna— ake , 
and 
aln L(y, 0) _ ETA 
Ou ao ” 
ain Lu, o) a ROM" 
ĝo o a 


The maximum-likelihood estimates of u and o will be obtained by 
solving the simultaneous equations : 


dln L(x, o) 0 alnL(u, 2) _9 
Op F oo, = 
We thus have 
p=k 
and 
6=V S(x 3) fn 


BASIO PRINCIPLES OF STATISTIOAL INFERENOE 449 


Apart from their intuitive appeal, maximum-likelihood estimators 
possess several nic€ properties : 

(a) Consistency: ‘The maximum-likelihood estimator 6 of a 
parameter @ is, under very general conditions, a consistent estimator, 

(b) Asymptotic normality ; It has also been found that, under 
general conditions, 6 is asymptotically normally distributed with 
mean @, 

(c) Efficiency: Among all asymptotically normal consistent 
estimators of ê, 6 has generally the smallest asymptotic variance. 
Hence 6 is generally efficient. 

(d) Sufficiency : If there at all exists a sufficient statistic for 6, 
then @ is also sufficient or is a function of a sufficient statistic. 

(e) Unbiassedness: Generally, Ô will not be an Uunbiassed esti- 
mator, but a simple modification will in most cases make it unbiassed. 
In Example 16.9 (Case 3), for instance, 

P= gaa) nas 


a2 
os" . 
ek i =s'? is. 


is not an unbiassed estimator of o?, but 


(f£) Invariance: A maximum-likelihood estimator of @ possesses 
the desirable property of invariance. Thus if 6 is the maximum- 
likelihood estimator of ð, then (6) is also the maximum-likelihood 
estimator of (8), % being a single-valued function of @ witha unique 
inverse. 


16.4 Interval estimation of parameters 

Estimation of a parameter by a single value, as in the above 
section, is referred to as point estimation. An alternative procedure 
is to give an interval within which the parameter may be supposed © 
to lie. This is called interval estimation. This may also be illustrated 
with the help of a variable x which has a normal distribution in the 
population with mean p (unknown) and standard deviation o (known). 
Let x1, tp oee , X, be the values of x in a random sample of size n 
from this distribution. : 

Now, it is known that any linear function of normal variables is itself 
normally distributed. The sample mean x, being a linear function of 
normal variables x,, %g = s Xa, is normally distributed and, as has 
already been shown, it has mean y and variance o?/a. 


¥s(1)—29 


450 FUNDAMENTALS OF STATISTIOS 


Hence 9 
Vn(ž—p)ļo 


is à standard normal variable. It follows that 
P| 2-576 —Vni—W< 2:576| 
g 
o| z—2576-2-< p < 3+2:576-2-|=0:99. 
5 : | ei, 3 V A 


The latter relatión shows that in repeated sampling it is very likely, 
the probability being 0-99, that the interval (%—2°5760/Vn, 
#42°5760Vn) will include p: In other words, if a very large number 
of samples, each of size n, are taken from the population and if for 
each such sample the above interval is determined, then in about 
99% of the cases the interval will include p, while in the remaining 
1%, it will fail to do so (vide Section 3.7). ‘One will therefore, be 
justified in saying, on the basis of a given sample, that p lies 
between #—2°5760/-/n and £-+-2°576o/ Vn, the limits being computed 
from the observations in hand. These are called 99% confidence 
limits to p, 0-99 being the confidence coefficient —a sort of measure of the 
trust or gonfidence that one may place in these limits for actually 
including p. ; 

The choice of the confidence coefficient in any particular case 
will depend on the discretion of the experimenter himeelf. Naturally, 
a value close to unity is selected, The general symbol for denoting 
a confidence coefficient is-1—a. 

We shall now deal with a more general set-up. Let @ be a para- 
meter and T a statistic based on a random sample of size n from the 
correspondidg population. “We shall suppose that 7’ is a sufficient 
statistic. ' 

Now, in many &ases it will be possible to find a function, say 
4(T, 0), whose distribution is independent of @. The statement 
Pra <S Y(T, 0) < Pans Where piap and Yan are the 
lower and upper a/2-points of 
the distribution of (T, 0), 
can often be written in an equivalent form as, say, 
6,(T) <0<6,(7). 


BASIO PRINCIPLES OF STATISTICAL INFERENOE 451 


Hence 
P(8,(T) <@<6,(7)) 
=P[fi -e SHT, 8) S pen) =1 =a, vs -(16:10) 
whatever the true value may be. The observed values of 0,(T'). 
and @,(7) will then be confidence limits to @ with confidence 
coefficient |—a. 
In the above example, (2, p)=Vn(ž—p)/a, which is distributed 
as a standard normal variable and hence independently of p. 
The confidence limits discussed in Chapter 17 also belong to this 
category. 


16.5 Test of significance 

Suppose a variable x is known to be normally distributed in a 
given population, with a known variance o? but with an unknown 
mean yz. Also, suppose it is suggested to us that the mean may be 
equal to a specified value, say p, and we want to see how accept- 
able this suggestion is. We have then the hypothesis 

Hy: p=po, 8 
which needs to be verified. Such a hypothesis is called a null 
hypothesis, because it states that there is no difference between » and 
#49: The verification (or test) of Hy has to be done on the basis of a 
random sample from this population. Let x,, xe, ......, x, be the 
values of x for a random sample of size n, the observations being 
independent. 

In order to test Ho, let us assume, to begin with, that it is true— 
that, in fact, the population mean of x is 4g. From this assumption 
a number of results will follow. The most important result for our 
purpose is that % is, according to the assumption, normally distri- 
buted with mean p and variance e*/n—in other words,-V/ n(3— pm) /o 
is a normal deviate, r. As such, 


P{W n|%—po| /o > 2:576]=0-01. 


To put it in a different way, in repeated sampling from this popu- 
lation, in only one in hundred samples is the value of V/n(z—p))je 
expected to exceed 2:576 numerically, This fact then provides a test 


452 FUNDAMENTALS OF STATISTIOS 


for the hypothesis. If in a given sample Vn|#—p9|/o exceeds 2-576, 
then it means that a value has been obtained which is very impro- 
bable under the hypothesis. In such a case the hypothesis itself will 
be held in suspicion. We say He is rejected. On the other hand, if 
in the given sample Vn|%—o|/o does not exceed 2°576, ie. ifit 
takes a value which is not improbable unde: the hypothesis, one 
would find no reason to suspect the hypothesis. It would then be 
said to be accepted. í 

By acceptance of a hypothesis we do not mean that it is proved 
to be true. All that is implied is that, so far as the given sample is 
concerned, we find no reason to question the validity of the hypo- 
thesis. Nor does rejection of H, mean a disproof of H,. It means 
simply that, in the light of the given sample, H, does not seem to be 
a plausible hypothesis. 

The mode of argument may be restated as follows: Some 
difference between the sample mean z and the hypothetical popula- 
tion mean pg is to be expected because of the inevitable sampling 
fluctuations. However, if this difference be too large, say greater 
than 2:5760//n—in other wards, if v/n|%—yo|/o>2-576—then 
one would say that it may not be due to sampling fluctuations 
alone but arises because the true population mean is not pọ. One 
would thus take it as significant or indicative of the falsity of the 
hypothesis. 2 

Hence a test of this kind is also called a test of significance. The 
probability 0:01, on the basis of which the differences are being 
regarded as significant of the falsity of the hypothesis or not, is 
called the level of significance. The choice of the level of significance, 
of course, depends on the experimenter himself. If he thinks that 
rejection of the hypothesis when actually it is true will be a serious 
error, he ‘will choose a rather small value, say 0-01 or 0-001. On 
the other hand, if he thinks that this error is not so serious, he will 
not mind taking a value as high as, say, 0:05 or 0-1. The general 
symbol for the level of significance is «.* 

The above test procedure will be appropriate when we are 
interested in knowing whether p is or is not equal to yo, ie. when we 


* It is customary in common statistical work to take a=0-05 or 0-01. 


BASIO PRINOIPLES OF STATISTIOAL INFERENCE 453 


want to test the null hypothesis Hy: p=pọ against the alternative 
hypothesis H : pApg- ‘ 

In some cases, however, we may want to know whether p is 
equal to jy or greater. Hp is then to be tested against the alternative 
hypothesis H : 24> ,. 


0-4 


— CO —<——— 2 vnd Í ——> OC 


Fig. 16.1 Distribution of Jn(%—j)/o for u= po and 
for p=p+ 8. 

Here, too, very small values of the statistic V n(¥—po)/o, as well 
as very large values, are to be regarded as unlikely when p=pọ But 
then very small values of the statistic are still more unlikely for a 
value of p greater than po (as will be apparent from Fig. 16.1). And 
since here we are concerned with a choice between p and values of 
p greater than jo, very small values of the statistic should lead to the 
acceptance of H, rather than to its rejection. On the other hand, 
very large values of the statistic, being unlikely when p=py but not 
so unlikely for p> po, should lead to the rejection of Hy. Ifthe level 
of significance be 0:01, Hy is, therefore, to be rejected in such a 
situation if for a given sample it is found that 


Vn(#—p9) /o > 2°326, 


since P[r> 2-326] =0:0), and is to be accepted otherwise. 
Similarly, when one wants to examine if p is equal to pg or 


454 FUNDAMBNTALS OF STATISTICS 


smaller, i.e. when one wants to test Hy : p=» against the alternative 
hypothesis H : p<po, One should take only very small values of 


Vni(8—p»)/o as indicative of the falsity of H,. As regards very large 


04 


Fig. 16.2 Distribution of Jj(%—po)la for p=ps and 
for p=po—8. 


values of the statistic, they are of course extremely improbable when 
=p But then they are still more improbable for any value of p 
less than po (see Fig. 16.2). If the level of significance be 0:01, Ho 
is, therefore, to be rejected when for the given sample 


V n(%—po)/o < —2-326 


and is to be accepted otherwise. 

From the above discussion, it will also be apparent why, in testing 
Hy against the alternative H : pp at the level of significance 0-01, 
we reject H, when V n(—)/o is less than —2-576 as well as when 
it is greater than 2-576. 

The nature of the alternative hypothesis thus determines which 
type of test (one-sided or two-sided) is to be used in any given case— 
whether the left tail of the curve of the distribution of the relevant 
statistic or its right tail or both are to be taken for defining values 
that lead to the rejection of the null hypothesis. 


—_— 


BASIO PRINCIPLES OF STATISTIOAL INFERENCE 455 


16.6 Neyman and Pearson’s theory of testing of hypotheses 

For a solution to the problem of testing of hypotheses, we have 
used here an intuitive approach. In order to give a more rational 
treatment of the problem, it is necessary to consider the probabilities 
of the two types of error that one may commit in rejecting or accep- 
ting a hypothesis on the basis of sample observations. These are the 
probability of the error committed in rejecting H, when, in fact, 
it is true (error 1)*, and the probability of the error committed 
in accepting H, when actually the alternative hypothesis is true 
(error II), This approach has been adopted by J. Neyman and 
E. S. Pearson in formulating a theory of testing of hypotheses. 

To present the basic principles of this theory, we shall suppose 
that the population, from which the variables x1, Xg «+--+ +X, are a 
random sample, depends on a single unknown parameter 6, the form 
of the distribution being known. We shall suppose that the popula- 
tion distribution is continuous and that 

SF (x|8) ss (16.11) 
is the joint density function of the variables x= (x1, X35 +++) Xa). 

We shall denote by W the whole set of possible values of x. Ww 
is called the sample space and may be looked upon as a region inn 
dimensions. : 

Consider the hypothesis ~ 

Hy : 0=0. 
Any test for H, is nothing buta rule for rejecting or accepting Hos 
depending on the nature of the sample observations x. A test would 
thus specify a set of points in“W, called the critical region or region of 
rejection and denoted by w, and would require the rejection of Ho if x 
lies in w and the acceptance of H, if x lies outside w (or in W—w). 
A test is thus determined by its critical region, and conversely. 

The probability of error I associated with the test is then the 
same as the probability that x lies in w when ^ is the true value of 
6, or, in symbols, 

P[xew| 9] = f+ f 1 [oa)dx, ve (16.12) 
w . =» 


where dx=dx; dxq.---+ dxa» 
*The probability is the same as the level of significance a (in case Ho is simple). 


456 FUNDAMENTALS OF STATISTIOS 


As regards the probability of error IT, suppose @,, and not bo, is 
the true value of 0. Then the probability of error II with respect to 
6, is the same as the probability that x lies in W—w when @, is true 
or, in symbols, 

P[xe W—w|0,J=1—P[xew|6,) 


=1~ ff Ree +. (16.13) 


P[xe w|@,], which is the probability of rejecting Hy when, in fact, 
6, is the true value of 0, is called the powér of the test with respect 
to 0,, because it measures in a sense the capacity of the test to detect 
the falsity of Hy. Of course, P[xew|9,] may vary with @,, and thus 
we may talk about the power function of the test. , 
Now, it would be an ideal situation if the test could minimise the 
probabilities of both types of error at the same time. However, 
with a fixed sample size, this is not possible : as one probability 
decreases, the other increases. A reasonable procedure would then 
be to fix the probability of error I at a desirable level, i.e. to make 
P[x w|] =a (say), w+ (16.14) 
and to choose from all critical regions w, satisfying (16.14), one 
which has the maximum power (or, in other words, the minimum 


probability of error II). In case we are interested in the alternative 
hypothesis 


H: 04:4, 
a critical region wọ will be the best if it be such that 
P[X€ wo |8] =0% w+ (16.15a) 
and P[x € wo| 6] > P(x «w| 4) for all 08, ... (16.15b) 


whatever the other region w, satisfying (16.15a), may be. Such a 
region wọ is called a uniformly most powerful critical region (and the 
corresponding test a uniformly most powerful test) of size (or of 
level) æ for 

Ay : 6=8, 
against the alternative hypothesis 

H; 040, ; 

Ip the same way, if we are interested in the alternative hypothesis 

H : 0> 0 (H : 0< 0o), then wg will be an ideal critical region, called 


BASIO PRINOIPLES OF STATISTICAL INFERENCE 457 


a uniformly most powerful critical region of size æ for Hy : #=% 
against the alternative H : 0> 6a (H : 0< 6o), if it be such that 
P[x€ w|] =% ... (16.16a) 

and P[x€ w| >P[x€w|6] for all 6>8% (all @<8), ++» (16.16b) 
whatever the other region w, satisfying (16.16a), may be. 

Unfortunately, for a two-sided alternative (H : 00) in most 
situations no uniformly most powerful test exists, although for one- 
sided alternatives (H : 0>6, or H : 6<6y) à uniformly most powerful 
test exists in most cases, s 

For a two-sided alternative, therefore, some additional criterion has 
to be found in making a choice among rival critical regions of size œ. 
One such criterion is unbiassedness. A region w is called biassed if 

P[x «w|4]<P[xew|%) 

for some 6+6,. A biassed region would thus reject H, with a smalle 
probability when H, is false than it would when Hp is true. This is 
an undesirable feature, and we should, therefore, look for unbiassed 
regions alone and restrict our choice of a desirable region of size a 


among those that are unbiassed. In this situation, 4 region Wọ 
such that 


P[X€wo|%)=o; ; ... (16.17a) 
P[xewo|O]2a forall 040, ... (16.179) 
and P[x ew) > P[xew] 6] for all 040, --- (16.170) 


whatever the other region w satisfying (16.17a) and (16.17b), may 
be regarded as best. It would be called a uniformly most powerful 
unbiassed region of size « for testing Ho : b= bo í 

The intuitive method of test construction has been followed in 
this book because of its simplicity. But it-may be stated that the 
tests obtained are, as a rule, the best available even from the point 
of view of Neyman and Pearson, 

Thus, when the alternative is one-sided our suggested test is 
uniformly mọst powerful, while in the case of a two-sided alternative 
the suggested test is, as a rule, uniformly most powerful among 
unbiassed tests. 


Example 16.10 The breaking strength of pieces of a type of string 
has mean 18-2 lb. and standard deviation 2-1 lb. A new method of 


458 FUNDAMENTALS OF STATISTIOS 


manufacture of strings is supposed to give a higher mean breaking 
strength, but for this the standard deviation is known to be practi- 
cally the same. Fifteen pieces manufactured by the new method 
have breaking strength (in Ib.) as follows : 
179 20 205 
168 192 17-4 
184 159 169 
18:0 198 176 
218 193 21-9 
- Do these results indicate that the new method is really a better one ? 
Let us denote the breaking strength ofa string by x. The new 
method of manufacture may be said to be better than the existing 
method if in the population of strings produced by the new method 
x has mean (p) greater than 18-2 1b. We have thus to test a null 
hypothesis in this case, viz. 
Hy : p= 18-2 1b., 
against the alternative 
i H: p> 18-2 Ib. 
For this purpose, we assume (a) that in the population x is 
distributed in the normal form and (b) that the given values of x, say 
x x; (i=, 2, a.. 15), 
are a random sample from the distribution. 
The population standard deviation of x is given: o=2:1 lb. 
Again, for the given sample 7=18-83 1b., so that 


ne Js 18-2) LS 18-2) _ 1.169. 


Since it is smaller than 2-326 as well as 1-645, the hypothesis Ha 
is to be accepted at both 1% and 5% levels of significance. In other 
words, the new method of manufacture does not seem to be superior 
to the old method. 


Simple and composite hypotheses 

The hypothesis to be tested may or may not specify the population 
distribution completely. When it does, it is called a simple hypothesis ; 
otherwise, it is called composite. Thus the null hypotheses considered 


BASIO PRINCIPLES OF sraTISTIOAL INFERENCE 459 


above are all simple hypotheses, On the other hand, suppose for a 
normal distribution the mean (,,) and variance (0%) are both un- 
known, and let the hypothesis to be tested be Hp : p=po. It specifies 
the mean, but the variance is left unspecified. Hence! it is a 
composite hypothesis with one degree of freedom (one parameter being 
unspecified). 

In the case of a composite hypothesis too, the general principles 
to be followed remain the same as described previously. But here 
in order to make the level of the test equal to a, we must have 

P[x«w|Hp]=constant, whatever the unspecified i 

parameters, say 0’, 0", ......, may be. -~ (16.18) 
Not all regions of W may have this property, Those that do are 
called similar regions (or regions similar to the sample space). Our 
choice then has to be made among similar regions. We first select 
all similar regions of size a. Next, from these we choose one that is 
uniformly most powerful, and if no uniformly most powerful similar - 
region exists, then one that is uniformly most powerful among 
unbiassed similar regions of size œ, 


16.7 Likelihood-ratio tests 
Closely allied to the maximum-likelihood method of estimation, 
there is a simple procedure for obtaining tests which has an intuitwe 
appeal and which generally gives tests with desirable properties. 
Suppose we are to test a simple hypothesis Fe 
Hy: = ins (16.19) 
against the alternative H:0#6,. Given the sample observations, 
rE ETO ,¥a) a natural way of judging the acceptability or 
otherwise of the hypothesis would be to compare the likelihood L(6,) 
with the maximum possible value of L(6). If 
A= Llb) 
mat) vs (16.20) 
were near to unity, then in the light of the given sample H, would 
seem highly plausible 3 on the other hand, if this were near to 
zero, Hy would seem to have little validity. A test for H, is thus 
provided by a critical region defined by À < Àp, where Ay is such that 
P[A < à| Hel =@- This relation fixes the probability of error I at æ. 


460 FUNDAMENTALS OF STATISTIOS 


As regards power, it can be shown that the likelihood-ratio test 
provides a sort of compromise among tests that would be most 
powerful for individual alternative values of 6. 

When more than one parameter are unknown, a hypothesis like 
(16.19) leaves some parameter or parameters (say ’) unspecificd ; so 
L(@, @') in that case is not a constant. Hence the comparison is then 
made between max L(6, 6'), the highest possible value of (6, 6’) under 

He 


LJ 

the condition that the hypothesis is true and the absolute maximum, 
max L(0, 6’). The critical region here is defined by À < Àp, where 

max L(6, 6") J 

a T N À wee Elle, 7) . (16.21) 

max L(6, 6") max L(9, ’) 

and A, is such that 
PA <A] Hy] =a. 

A result that wili simplify the test in the case of large samples is 
that —2In) is, under Hy, distributed approximately as ʻa x? with 
df=v, where v denotes the number of Parameters specified by H,. 

Example 16.11 Consider a random sample of size n from a normal 
distribution such that the sample observations are . 


e Hyp Kgs see cee PETT 


(a) Suppose the mean of the population (u) is unknown but the 
variance is known (o°=04?). To obtain the likelihood-ratio test for 
the simple hypothesis 

Hy? p=po, 
note that here the likelihood function is 


I 
L(y) = CoV impr Pl zX Z (xi—p)?/20). 


The maximum-likelihood estimate of u has been found to be z in 
Section 16.3. Hence : 
1 


max Mel ies 


exPL— X(x.—8)*/20,%] 


and am H Fax = exp[—n(R— pa)*/2o). 


BASIO PRINOIPLES OF STATISTIOAL INFERENOE 46) 


The critical region is thus given by 


À< Ào 
or n(ž— po)? c 
oy" 
or Vn\z— Hol pe 
% 


where k is such that 


peia AE M |=a. 
To 


Since yaa is, under H,, a standard normal variable, we 
0 ` 
must have k=7,.)). Hence the critical region is given by 
Vn|z#—po| 
To 
(b) If both p and o? ate unknown, we may like to test the 
composite hypothesis 


> Tala" 


Ho : B= 
The likelihood function is now Š 
L EEEN, ;—p)*#/20* 
(m 9) = pe Pl Zein) "Rot 
The maximum-likelihood estimates of p nnd o are % and s, 
respectively. Hence 


exp[—n/2). 


1 
L(p, 0) = > 
max Lu») = I 
Under Ho, n likelihood function is 
L(po 0) = en exp[— Eiro) he]. 


4 (o v2 x)" 
“Here the maximum-likelihood estimate of o is 
ae ° 
= eared 
n 


1 
As such, ea L(u, o)=max Lpo 0)= mya ri 


462 FUNDAMENTALS OF STATISTIOS 


The critical region for the likelihood-ratio test is then given by 
A<Ag 


or by —<k; 
A 
or, since nso =n(¥— po)? -+ns? and ns?= (n—-1)s'?, by 
Vn|ž3— p| >C, 
d - , 


where C is such that 


P Vaizel > cja Sa. 


But VAH) is, under Hy, distributed as a t with df=a—l. 
S 


Hence C=ta;2, n- and the test is defined by the critical region 


Dial >t 


e13, "8-1" 


Example 16.12 Consider n Bernoullian trials with probability of 
Success p. We may associate with the ith+rial a variable x; with 
probability-mass function 


Teze |‘ (1=p)' ! for x00, J, 


For a given set of values of Xp Xp een >*,, the likelihood 
function is 


L(p)=p' (1p), 
where s=number of successes in the n trials taken together. 
In order to test the hypothesis 


A, p=, 
we shall make use of the likelihood-ratio 


L(1/2) (1/2)" (n]2)" 
max L(P) G) ( =)" Ce 


since the maximum-likelihood estimate of p is s/n. 


BASIO PRINCIPLES OF STATISTICAL INFERENCE 463 


The critical region is then given by 


s(n—s)*"'>C, 
or by 


n 
s<5-G, or s>5+Cv 
the function s’(n—s)"-* being symmetrical about s=n|2 and 


increasing as s deviates in either direction from n|2. The constant 
C, is such that 


P[s<p—G; Hg) +P[s>5+6| Ho]=o- 


Actually, the level, in most cases, will only be approximately 
equal to a, since here we are dealing with a discrete variable. 


Questions and exercises 


16.1 Explain the problems of estimation and testing of hypo- 
theses. Distinguish between point estimation and interval estimation. 

16.2 Discuss, with examples, the notion of a minimum-variance 
unbiassed estimation. 

16.3 Explain, with suitable illustrations, the criteria of consis- 
tency, efficiency and sufficiency, as used in the theory of estimation. 

16.4 Give an outline of the Neyman-Pearson theory of testing 
of hypotheses, explaining the concepts of the errors of type I and 
type II, power and unbiassedness. 

16.5(a) Describe the maximum-likelihood method of estimation. 
What are the properties of a maximum-likelihood estimator ? 

(b) What is a likelihood-ratio test? What can be said of the 
large-sample behaviour of the associated test criterion ? 

16.6 A variable x is normally distributed in the population with 
mean 100 and standard deviation 5. Determine how large a sample 
is to be taken from this population in order that the sample mean 
will not differ from 100 by more than | with probability 0 95. 

` Ans, .97. 

16.7 From a large lot of freshly-minted coins a random sample 
of size 50 is taken. The mean weight of coins in the sample is found 
to be 28:57 gm. Assuming that the population standard deviation 


464 FUNDAMENTALS OF STATISTICS 


of weight is 1:25 gm., will it be reasonable to suppose that the 
population mean is 28 gm. ? Partial ans. 1==3°224. 
16.8 For the data of Exercise 16.7, obtain the 99% confidence 
limits to the mean weight of all coins in the lot. 
: Ans. 28:11 gm, and 29:03 gm. 
16.9 Find the maximum-likelihood estimator of @ for a random 
sample x1, Xg ees 3 *, from each of the following populations : 


(a) s=] O<x< o; 
(b) f(*)=exp[—(x—8)], 9<x< o; 
(e) flx)=JexP[—|x—0]], =o <x< a. 


Ans. (a) 6=sample mean ; (b) 6=x,,,, the smallest 
observation ; (c) 6=sample median. 

16.10(a) An urn contains white and black balls in unknown 
proportions, the total number of balls being 8. Three balls are 
taken at random, of which 2 are found to be white and ! black. 
Find the maximum-likelihood estimate of the number of white balls 
in the urn. 

(Hint: The likelihood function is 


zh ee Ni 8—N, 8 7 

uw= (aT 
Ans. Both 5 and 6 are ML estimates. 
(b) A random sample of size n has been taken (without 
replacements) from a population of size W. NV is unknown, but the 
number of individuals in the population with the character A is 
known to be Ny. Let x among the n members of the sample have 
the character A. Show that the maximum-likelihood estimate of V 
is, approximately, =. 


{Hint: The likelihood function is 
Be e 


Discuss how this method may be used in estimating the number 
of fish in a pond or the number of birds in an aviary. 


BASIO PRINOIPLES OF STATISTIOAL INFERENCE 465 


16.11 For the set-up of Example 16.1, obtain the expected value 
of £ (1 —4) Hence suggest an unbiassed estimator of p(1—p). 


16.12 (a) Find the maximum-likelihood estimator of 1/p for the 
observation x from the discrete distribution with p.m.f, 
f(x) =(1—p)*p, for x=1, 2,...... + 
(b) Show that the estimator is unbiassed. What is the 
variance of the estimator ? Partial ans. \/p=x. 
16.13 Let x be normally distributed with known mean py but 
unknown variance o*, Evaluate 


E(|x—#0l)- Lee) 
Hence suggest an unbiassed estimator of o, based on a random 
sample, Xy Xg e , Xas from the distribution. Derive the expecta- 


tion of s=y/ 36i) and suggest an alternative unbiassed 
i 


estimator of ø. Compare the variances of the two estimators and 
comment. ; 
16.14 Starting from the equation 
o?=E(x*)—p?, * 
obtain an unbiassed estimator of p? What is its principal defect ? 
16.15 xp žo ene ža are independent, random observations 


from the rectangular population with density 
f=)» 0<x<0. 


Consider the critical region *,.)> 0°8 for testing the hypothesis 
Hg: 0=!, where x(n) is the largest Of xy, Xg =e Xa. What is the 
associated probability of error I and what is the power function ? 

Partial ans. Power function is 
0 for 0 <08 


nxen- | -ET for 0> 08. ` 


16.16 Obtain the likelihood-ratio tests, based on random samples 
for the following hypotheses : 


(a) H, : 0=9% when soel- 0<x< 0; 


Xy Xp corre m Bas 


(b) Hy: 6=9% when fi) O<x<8. 


xs (1)—30 


466 FUNDAMENTALS OF STATISTIOS 


16.17) Tatto a , t be mutually independent and unbiassed 
estimators of p with variances Vy, Vp ...... » Vk, respectively, 
Consider a linear function : 


T =a+ Yb; tj. 
i 


Choose the constants a, b, I ARTAS »b in such a way that T is 
unbiassed and has the smallest variance among all unbiassed linear 
estimators. 


SUGGESTED READING 


[1] Anderson, R, L. and Bancroft, T. A. Statistical Theory in Research 
(Chs. 8—11). McGraw-Hill, 1952. 

[2] Hogg, R. V. and Craig, A. T. Introduction to Mathematical 
Statistics (Chs. 5, 9—11). Macmillan, 1965, and Amerind. 

[3] Keeping, E. S. Introduction to Statistical Inference (Chs. 5, 6). 
Van Nostrand, 1962, and Affiliated East-West Press, 

[4] Mood, A. M., Graybill, F, A, and Boes, D. C. Introduction to the 
Theory of Statistics (Chs. 7, 8, 11, 12). McGraw-Hill, 1963, and 
Kogakusha. 

[5] Rao, C. R. Advanced Statistical Methods in Biometric Research 

d (Chs. 4, 8a). John Wiley, 1952, 

[6] Wald, A. Principles of Statistical Inference. Notre Darie, 1942, 


1 fi EXACT TESTS AND 
CONFIDENCE INTERVALS 


17.1 Introduction ; 
The general principles followed in testing a hypothesis regarding 
a parameter or in obtaining confidence limits for a parameter have 
been explained in the previous chapter. We shall now consider the 
` application of the general principles to particular problems, 


17.2 Tests relating to binomial distributions 

Suppose a random sample of size n is drawn from an infinite 
population for which the proportion of individuals having a 
character A, say p, is unknown, In order to test the hypothesis ; 


Hy: p=po 
we make use of the statistic x, the number of members of the sample 
having the character A, which is a sufficient statistic for p. 
Now, under the hypothesis H,, x is distributed binomially with 


parameters n and po. 
Let the observed value of x be xp. 


(a) In case we are required to test H, against the alternatives 
H : p>po we shall compute the probability 


Pix>rlbl= 2 (i) bot 0—20)". 
s *o 


If this does not exceed the specified level of significance, a,’ we 
“shall consider x, to be an unlikely value under the hypothesis and 
shall, therefore, reject Ho. Otherwise, H, will be accepted. 

(b) If, on the other hand, the alternative hypothesis is 
H : p< po» we shall compute 


Plx<solpol= E (haa) 
acto 
and shall accept or reject Hs according as this probability does or 


does not exceed a. 
467 


468 FUNDAMENTALS OF STATISTICS 


(c) The two-sided alternative H: pp, maybe of interest 
in case pp=4 (e.g. when our problem is to test whether a coin is 
unbiassed). Here we compute 

1), 
al: 


Hfsgc— loop] tele $e aed 


-fep -a |] efeg+a |] 


where d= ln—5 |, and compare it with æ for acceptance or rejection 


of Hy : p=}. 

For some selected n and p, these probabilities may be had from 
Table 37 of the Biometrika Tables, Vol. I. Much more extensive are 
the Tables of the Binomial Probability Distribution, prepared by the 
(U.S.) National Bureau of Standards. 

Consider next two populations for which the proportions of 
individuals having a character A are p, and pẹ, Again, we shall 
denote by x, and x,, respectively, the numbers of members having 
the character A in random samples of sizes n, and n, drawn indepen- 
dently from the two populations. We may be interested in the 
hypothesis 

Hy: pi=po 
We shall again make use of the statistics x, and xg, but shall concen- 
trate our attention on samples for which x=x,+x, is a constant (the 
same as the observed sum of x, and x,). Under Hy if we denote the 
common value of the two proportions by p, the p.m_f.’s of x, x, and 
x=x; +x, are 


fix)= (i) ors—pyrimns, 
/ ad= (pi) praptaa 


and s= (Htm) papt. 


EXAOT TESTS AND CONFIDENCE INTERVALS 469 


The conditional p.m.f. of x, for given x is, therefore, 

ni) (Me) ptit tec] _ py "at "27 *1-* 

es Lali NN Sa 
nita) px njtng=a 
(pape 


es | | Pei Lf 


nit n, 
ay 
(i.e. is of the hypergeometric form). 
If the observed value of x, is x4) and that of x is x9, we consider 
the conditional p.m.f. f(x |%9) for testing Ho. 
(a) If we are interested in the alternative H:f,>f, We 
compute ~- 


HAMER 


Pix > 0 |(4=2) = aan 
17 X10 o. Pe (im 
o 


fal) = 


and accept or reject H, according as this probability does or 
does not exceed a. 
(b) On the other hand, if we are interested in the alternative 


H : py < Pa We comipute 
ES 


Pix Stele LD MAn 


4S" 10 pe 
o 


and compare it with æ for acceptance or rejection of Ho. 

These probabilities may be obtained directly from the Tables of 
the Hypergeometric Distribution (Stanford University Press). 

Example 17.1 A manufacturer of fluorescent tubes claims that no 
more than 6% of his product is defective. A sample of 20 tubes is 
found to contain 4 defective tubes. Does the manufacturer’s claim 
seem justified in the light of these data ? i 

The number of defectives in a sample of size 20 may be supposed 
to be distributed in the binomial form with parameters n=20,and p 
(unknown), Under the hypothesis H : p=0-06, which is to be tested” 
against the alternative H : p> 0-06, the probability that a sample 


= 470 FUNDAMENTALS OF STATISTICS 


will have 4 defectives or more is 


2 AAG i Pid calls =1-3 Ce) (0°06) * (0:94)20-« 


=] —0-97104=-0-02896, 


Since this probability does not exceed 0:05, the hypothesis is to be 
rejected at the 5% level. The data thus seem to contradict the claim 
made by the manufacturer. (If one uses the 1% level, then, of course, 
the hypothesis will have to be accepted, i.e. one will not consider 
the data to be incompatible with the manufacturer's assertion.) 
Example 17.2 It is required to compare two methods of treating a 


type of allergy. Method I was used on 15 patients and Method II 
on 14. The results are shown below : 


Method I | Method II 
Cured 6 1 
Not cured 9 3 
Total 15 14 


Is Method II better than Method I ? 


If p, denotes the proportion of cured persons among all people 


treated by Method I and p, the corresponding proportion for 
Method II, then the hypothesis to be tested is H, : p, =p, 


Here ny=15, ng=14, xy=17 and x4)=6. 


As the hypothesis Hy: p;=p, is to be tested against H : Pi < Po 
we compute 


Since this does not exceed 0-05, H, is to be rejected at the 5% 


level. Thus at the 5% level, Method II is to be considered better 
than Method I. 


==0:0407. 


EXAOT TESTS AND CONFIDENCE INTERVALS 471 


17.3 Tests relating to Poisson distributions 
Suppose xy, Xg --.++- »*, are a random sample from a Poisson 
distribution with unknown parameter A.- Here we may be required 
to test the hypothesis 
A, : A=Ay. 
To develop a test we make use of the sufficient statistic 


I= 2 a i 
which is itself distributed, as has been shown in Section 15.5, in 
the Poisson form with parameter nà. The p.m.f. of y under H, is, 
therefore, 
— 3 
O aa eh” to ya: 
Let yọ be the observed value of y. 
(a) Ifthe alternative of interest is H : A>ù, we evaluate 
o w Aa)? 
PLy>yolrol= Z ERA) 
I= 9 J: 
and proceed as in Section 17.2. 
(b) On the other hand, if the alternative of- interest is 


H :A<A), then we have to compute 


PSII D REAA, 
J=0 J! 

These probabilities may be obtained from Table 7 of Biometrika 
Tables, Vol. I. 

We may, again, be interested in a comparison of the parameters 
AÀ, and A, of two Poisson distributions. Let xy (i=1, 2, .-.... 3M) be 
a random sample from the first distribution and x»; (i=1, 2,...... > na) 
be a random sample from the second, the two samples themselves 
being mutually independent. A test for 

Ay : Ay=A¢p 

should be based on the sufficient statistics 


4 
ES irr Bë 
a 


3 
and n= me 


472 FUNDAMENTALS OF STATISTICS 


We consider only those samples of sizes ñ, and n, for which 
y=) +y, is a constant (the same as the observed sum of y, and y3). 
Under H,, if we denote the common value of the two parameters by 
A, then the p.m.f.’s of y,, Ya and y=y, +y; are 

S( 1) =exP(—mA)(mA)?1/ 94! f (22) =xP(— Med) (nA)? 2/ ye! 
and f(y) =exp[— (my +g) [ (m +m)A]?/ 91. 
The conditional p.m.f. of y, for given y is, therefore, 


mpi (mi HANA? (aA)? fiy) 
Sols) = expl — (mF Mg)A}[(my+m9)A]? /9! 


AU TEES, (e p z (17.2) 


i.e. is binomial with parameters y and n,/(ñ, +n). 


Denoting the observed values of y, and y by y,9'and yo, respec- 
tively, we consider the conditional p.m.f. f ( y, | y) for testing Hy. 


(a) In case our interest lies in the alternative H : à, >À, we 
compute 


Pinyol I=], 2 Vl, 


ay Joli 
a onl 
and compare it with « for acceptance or rejection of Hg. 
(b) If, instead, we are interested in the alternative H : \,<Ag, 
we have to compute 
7 y “7 
Pin< ei E oI 
[SI] 9=%o] 2h oe (2) te 


and compare it with a for acceptance or rejection of Hy. 


17.4 A test for independence of two attributes 

In many investigations one is faced with the problem of judging 
whether two qualitative characters, say A and B, may be said to 
be independent. 

Let us denote the forms of A by A; (i=1, 2, .....- »k), the forms 
of B by B; (j=1, 25 ...... , l), and the proportion associated with the 
cell A,B; in the two-way classification of the population by py The 
probability associated with 4; is then 


i 
bio= PA 


EXACT TESTS AND CONFIDENCE INTERVALS 473 
and the probability associated with B; is 
k 
boj= E Pir 
The hypothesis to be tested here is 
H : pig=Pio Pop for all i, j. 
Suppose now that for a random sample of size n drawn with 
replacements (or, in case the population is infinite, without replace- 


ments), ny is the observed frequency for the cell 4;B;. The marginal 


frequency of 4; is 
L 
Nig= $ Mijs 
jmi 


and the marginal frequency of B; is 
k 
n= Nij 
Note that the joint p.m.f. of n; is multinomial : 
f(n heee » Mgt |Prr> Piss eee » Pes) 
n! pr'a nij 
TO EE L. Ho) . 
I pd re i” 
#1 jel 
Under Hp, this is 
ni k Rig I noj 
r — Hiv) Ho) e 
gyes m 

This could be used for testing Hy if pio (1=1, 2, s.. K) and fo; 
(j=, 2, oi , l) were known quantities. 

When these are unknown, we use, instead of the unconditional 
distribution of nj; (all i,j), their conditional distribution for fixed 
marginals, nj (alli) and noj (allj). Under Ho the joint p.m.f. 
of n (i=1, 2, ++ , k) is 


/ 


and the joint p.m.f. of no; (f=1, 2, «+++» , l) is 


n! l "of 
pien 
na 


474 FUNDAMENTALS OF STATISTIOS 


Therefore, under H, the conditional distribution of n,; (all i, j) for 
fixed marginals has the p.m.f. 


! fi noj 
Te. Iio) y H (Poi) 


ae) ! y3 
Moyo E tem ': 


d M) ; (17.2) 
PT Ca) 


This conditional distribution now provides us with a test for H). 
The use of this technique is illustrated in the following example 
with a 2x2 table. 


Example 17.3 The following table is based on a random sample 
of persons attending the preview of a motion picture ; 


| Age below 40 Age 40 or above Total 

Liked the picture | 32 | 8 | 40 
10 

50 


Judge whether the picture has equal appeal to the young and the 
old or whether it is more liked by the young. 

The problem here is to test the hypothesis (H,) that the two 
attributes, youth and liking for the picture, are independent, against 
the alternative that they are positively associated. We have then to 
add up the probabilities, under Ho, of the given table and of those 
indicating more extreme positive association (and having the same 
marginals). These tables and the corresponding probabilities are : 


Probability=0-0172 Probability=00024 


EXAOT TESTS AND OONFIDENOH INTERVALS 475 


Probability=0-0000  Probability=0-C000 
Since the sum of the probabilities, viz. 0-0198, is smaller than 0-05, 
H, is to be rejected at the 5% level. In other words, here the data 
do indicate that the picture‘is more popular with the young than 
with the old. 

In the following sections, we shall consider different problems in 
testing of hypotheses and interval estimation which can be solved by 
using the four fundamental distributions discussed in Section 15.6 It 
will be assumed in all cases that the population distribution is normal. 


17.5 Problems regarding a univariate normal distribution 

Consider a population where x is normally distributed with mean 
p and standard deviation ø. Let xy, xg; ----- , x, be a random sample 
obtained from this distribution. We shall denote by z the sample 
mean of x: 


k= xn 
and by s’* the sample variance of x : 
1 
2 — F) 
s aaia 2) . 


The distinction between s’? and s* is to be noted. In s'? the 
divisor is n—1, which makes it an unbiassed estimator of o°. - For 


` Za) = Ze) n(ž— p)’, 
so that 
a n 


= HEG) 
-H var(x;)—nvar(ž)} 


=H {not—ax Z)mot. ve (17.4) 


476 , FUNDAMENTALS OF STATISTIOS 


Case 1: p unknown, o known 

Here we may be required to test the null E Ag: p= po. 
It has been shown in the previous chapter that the test procedure 
for H in this case is based on the statistic v/ n(¥—p9)/0, which is 
distributed as a normal deviate (7) under this hypothesis. 

(a) Forthe alternative H : p > po, Hy is rejected if for the given 
sample 7 > Ta (and is accepted otherwise). 

(b) For the alternative H : p < Hos H; is rejected if for the given 
sample 7 < 7,_4(=—7,). 

(c) For the alternative H : ppo, Ho is rejected if for the given 
sample |T| > Tar 

In each case « denotes the chosen level of significance. 

As regards the problem of interval estimation of p, it has been 


shown that the limits Fan XZ and tr, aX, , computed for 


the given sample, are the confidence limits to p with confidence 
coefficient? | —a. 


Case 2: p known, o unknown 
Here one may be interested in testing a hypothesis regarding o or 
in estimating'o. A sufficient statistic for ø is Z(xi—p)* or V x (x;—p)?. 


It is seen that x; isa normal variable with mean p Aa standard 
deviation o. Hence 
a ae 2 
a G54), Se (17.5) 


being the sum of squares of n adepan normal deviates, is 
distributed as a x? with df=n. 


For testing Hy : =o, we make use of the fact that 
— S 
PEZE =g 

isa X? with df=n under this hypothesis. 

(a) For the alternative H : o > oy Ha is rejected in case for the 
given sample X? > 3, ,. 

(b) For the alternative H : o < øp, Hy is rejected if for the given 
sample x* < x? i-a, m 

(c) For the alternative H : 0404, Ha is rejected if for the given 
sample x? < x’i=ar, s OF x* >Kan, 


EXACT TESTS AND CONFIDENOE INTERVALS 477 
As a consistent (but biassed) point estimate of o, we have 
l 2 
y. aai E) 


To get a confidence interval for o, we note that 


Zue) 


P[xs-ai2, ns! a Sae, »|=!-4 
K A Nal E 
b SS =l—a. 
[ Xal, n X’i-0)9, ‘| a 


The confidence limits to ø? with confidence coefficient 1—a are, 
therefore, 


Zue)" Z (xe)? 
gk, eae 


X’ajo,n Meare, n 


The confidence limits to g are just the positive square-roots of these 
quantities, with the same confidence coefficient, l— a. 


Case3: p and o both ‘unknown 
In this case, # and s’ are jointly sufficient for p and o. 
Here to test Hy: #=po Or to have confidence limits to p; oné- 


cannot use the statistic /n(z—p)/¢ since o is unknown. o is in this 
case replaced by its sample estimate, ENE The re- 
sulting expression will be 
V n(%—p)/5'. 
Now, it has been shown in Section 15.7 that 


aiy POI)" (17.6) 


is a x? with df=n—1 and is distributed independently of z. Thus 


‘ Vn(#—-) 
Vn(8—p) _ g , we (17.7) 


A s 


$ Cg N 
va AAT 


being of the form K 
i vVx[(a—1) 


pendent of 7, is distributed as a ¢ with df=n— 1. 


, where X? has df=n—I and is inde- 


478 FUNDAMENTALS OF STATISTIOS 


To test He : p= j we may, therefore, use the statistic 
t=V n(2—p9)/s 


with df=n—1. We shall have to compare ¢ (computed from the 
given sample) with f,,,-;, or t with — ta, ,-1) OF [e] with tays, n-» 
according as the alternative of interest is H: p>po, Hi p<po OF 
H : pFpo- 

In order to obtain confidence limits to p, we see that 


P|—tan, n-i A eer -1|=1-2, 


> 4 yg 

i.e. P| #-teie, rix eSI Hn X= 

The 100(1 —a)% confidence limits to p will, therefore, be ¥=— t2, n-1 X 

5 and R RSE es these being computed from the given 
n n 


Vv 
sample. 
“In this case we may have also the problem of testing Hy : o=0y 
or the problem of obtaining confidence limits to o. 
From what has been said above, it is clear that 


L(xi—2)* ljg’? 
i A )s 

og? o> 
is, under the hypothesis Ho, a X? with df=n—1. This provides us with 
tests for Hy. The value of this x*, computed from the given sample, 
is compared with x*,, n-1 OF X%;~4, n-1s according as the alternative 
is H:a>0, or H:o<oy. For the alternative H: ooo, on the 
other hand, the computed value is to be compared with both 
X-as, s-1 ANd x’aje, n-i Ho being rejected if the computed value 
is smaller than the former or exceeds the latter value, 

Since 

—1)s’2 
P(X, 9a OE exten, s-1]=1—a, 


i.e. Coa o <a] -a 


Xela, a= 


EXACT TESTS AND OONFIDENOZ INTERVALS 479 
the confidence limits to g? are 
° 


aD? ang (01) 
Xart, a-l X i<ain,e-3 


The confidence limits (with the same confidence coefficient, 1—«) 
to o are, of course, the positive square-roots of these quantities. 


Example 17.4 The following are 12 determinations of the melting 
point ofa compound (in degrees centigrade) made by an analyst, 
the true melting point being 165°C. Would you conclude from 
these data that his determinations are free from bias ? 


164-4 161-4 
169-7 162-2 
163-9 168-5 
162-1 1634.7 
160-9 162-9 
160:8 167-7 


The determinations made by the analyst may be said to be 
unbiassed if the mean determination in the population, that could be 
obtained if he took an infinite number of readings, can be supposed 
to be 165 degrees. We have, therefore, to test the null hypothesis 
H, : p=165 against all alternatives H : p#165. 

It will be assumed (a) that the population distribution of deter- 
minations is of the normal type and (b) that the sample observations 
are random and mutually independent. 

Under these assumptions, a test for Hy is provided by the 
statistic j 

t= V n(ž— 165)/s', 
which has df=(n— 1). 


For the given observations, 
n= 12, 
7 
and z u; =292:83 


(where u,;=x;—160). 


480 FUNDAMENTALS OF STATISTIOS 


Hence ` 
g= 1604479 160-+3:992=163-992 degrees 
ae nf EE ASSES -992)8 BY ua 
= V 9:2361 =3 039 degrees, 
so that 


V T2(163:992—165) 
aaa A 


3-464x 1-008 _. 4 
=— =—1-149. 
“039 ; 

From Table IV in Appendix B, 

, bons = 2'201 
and toos, 1 =3"106. 

Since for the given sample | tj is smaller than both these tabulated 
Values, H, is to be accepted at both the 1% and the 5% level of 
significance. In other words, we find no reason to suppose that the 
analyst’s determinations are not free from bias. 

Example 17.5 The weights at birth for 15 babies born in a 
Calcutta hospital are given below. Each figure is correct to the 
nearest tenth of a pound. 


6-2 5:7 Behe 
6:7 48 5:0 
71 6-8 58 
Co 76 79 
75 7:8 85 


Give two limits between which the mean weight at birth for all such 
babies is likely to lie. 

Let us denote by x the variable : weight at birth per baby: Our 
problem here is then to find, on the basis of the given sample of 15 
babies, confidence limits for the population mean of x. We shall 
assume (a) that in the population x is normally distributed (with a 
mean p and a standard deviation g, both of which are unknown) 
and (b) that the given observations form a random sample from 
the distribution. 


EXAOT TESTS AND CONFIDBNOE INTERVALS 481 


Under these assumptions, the 100(1—a)% confidence limits to » 
will be 

abaja, ona X—y- aNd Bf begat KS 

n n 


In the present case, 


n=15, 
sleet 
and Er = 716-88. 
Hence 
#=-102-4/15=6'827 lb. 
and (see ree 


=1-+126 Ib. 

Again, consulting Table IV of Appendix B, we find that 

teonstd=2'977- 

Hence the 99% confidence limits to p are 

1-126 3:352 
6-827 —2:977 x =6°827 — =~ = 6-827 — 0:865=5:'962 lb. 
V15 FE 
and 6:827+0:865=7:692 Ib. 

The Confidence coefficient being as high as 0: 99, one may well 
assert that the population mean lies between 5-962 Ib. and 7-692 1b. 

Example 17.6 A firm manufacturing rivets wants to limit varia- 
tion in their length as much as possible. The lengths (in cm.) of 
10 rivets manufactured by a new process are : 

215 1-99 2°05 2-12 2-17 

2:01 1-98 2-03 2:25 1-93 
In the past, the standard deviation of length of rivets manufactured 
by the firm has been 0°145 cm. Examine whether the new process 
seems to be superior to the old. 

Ifo be the standard deviation of length for all rivets manu- 
factured by the new process, then this may be considered superior if 
o<0:145. The null hypothesis is then Hy : e=0°145, which is to be 
tested against the alternative H:0<0-145. 


ya (1)}—31 


482 FUNDAMENTALS OF STATISTICS 


Under the usual assumptions, the test will be given by the statistic 


2(xi—#)? 
(0°145)*” 


which is, under Hy, a X? with df=n—1. For the present data, i 
X(x —#) = F(u a? aa (Xui)2/n [putting u=x—2-00] 


=01372— PË 0-1372—0:04624=0:09096. 


REA T 0:09096 

0-021025 

Now, x°.99,9=2°088, and x*.,,,,=3'325. The observed value is thus 

insignificant, and the null hypothesis is to be accepted ; i.e., the new 
process does not seem to be superior to the old, 


=4-326, 


17.6 Comparison of two univariate normal distributions 

Let the distribution of x in each of two populations be normal. 
Suppose the mean and the standard deviatian of x for one population 
are p, and o,, while for the other they are pg and øp respectively, 
Suppose further that xip *19, ot, are a random sample from 


the first distribution, and x9, x99, ------ r¥eng are a random sample 


from the second. The first set of observations is also supposed to be 
independent of the second set. Then 
*s 
R= È tjm 
js1 
’ Pz 
and s =N Feu- 
are the mean and the standard deviation of x in the first sample, and 
"3 
Ra= Daily 
j=i 
; y u 
and s =V 2 (y—¥)*/(™—!) 


are the co-responding sta tistics for the second sample, 


EXAOT TESTS AND CONFIDENOB INTERVALS 483 


Case 1: py, pg unknown but oy, og known 
In this case one may be concerned with a comparison Jaen 
the population means. One may have to test the hyothesis that p, 
and p, differ by a specified quantity, say 
Hg? pı— p= fo 
or one may like to obtain confidence limits for the difference py — pe. 
It may be seen that #,—%,, being a linear function of normal 
variables, is itself normally distributed. It has mean 
E(%,—%,) =E(%) —E (22) 
4 — Hy see (17.8) 
and variance 
var (%,—%,) =var (¥,)+var (z) 
a7 4 oe, 
ni Mg 
the covariance term being zero since %, and x, are independent. 
As such, 


(17.9) 


Zı— že) — (t1—H) 


(ere) 

n na ‘ 

is distributed as a standard normal variable. 

To test 

Ho : pha =o 

we make use of the statistic s 
¥1—%)—£o 
(2 iT, T’ ss» (17.10) 

mn mM 


which is distributed as a standard normal variable (7) under Hy. Hy 
is to be rejected on the basis of the given samples if r>r, or if 
T<—Ta, according as the alternative hypothesis in which the 
experimenter is interested is H: Hi—Ha> bo or H: mh. 
On the other hand, if the alternative is H : p;—p,4€,, Ho is to 
be rejected when |r |> Tas: In the commonest cases, the null 
hypothesis will be Hy : 4;=,, for which £,=0, 

If the problem is one of interval estimation, then it will be found, 


484 FUNDAMENTALS OF STATISTIOS 


following the usual mode of argument, that the confidence limits to 
p— p (with confidence coefficient 1—«) are 


2 12 2 is 
(%:—%2)—Tare Pome = and (Rt ran (E+ i 


Case 2: pys Ha known but o}, o, unknown 

Here it may be necessary to test the hypothesis that the ratio of 
the two unknown standard deviations has a specified value, say 
Hy : o,/0,=&,, or to set confidence limits to this ratio. 

Since 

at Fel | 
jE Gur mitost and $ (syn) 
are independent x’s with n, and n, degrees of freedom, respectively, 
(*13—#1)*/m0," 


Cm ae, 7.11) 


is distributed as an F with (n,, ną) degrees of freedom. 
Under the hypothesis H, : c,/o,=,, therefore, 


x bJ (*y— m)’ 1 
is an F with df= (n,, ng). This provides a test for H,. When the alter- 
native is H: an ča Ho is to be rejected if for the given samples 


. F> F, 


Myo Mg? 
If the alternative is H ii< Eo Ho is to be rejected if for the 
2 


given samples 
F<Fi3; Myr Mo? 


3 
F 

Lastly, when the alternative is H : BAto H, is to be rejected 
if the samples in hand give either i 


ie. if > Ferny. ny 


1 
P<Fynaia; ayia 1.€, Fo Pansaga 


or F>Feiayasa,: 


EXAOT TESTS AND CONFIDENCE INTERVALS 485 


The commonest form of the null hypothesis will be Hy : o1 =0g, 
for which ¢,=1, and here 


Pa n 
Mo Z (raj) ng 
simply. 


For the purpose of setting confidence limits to o,/0,, we see that 
D(*1y— m)’ mes? 


1 
a Fee gs n ema ee Fan; Bye +|=!-2 


sE Z y—m)’/m PE r aiai, 


ie. <2 < Fin: - 
i.e l <4<F IBimgi te (ay h) im 


Farin, ng 3 2 (žo; — pa)" og 
=] —a. 
The confidence limits to e (with confidence coefficient 1—a) will, 
2 


therefore, be 


1 y% (yp) m dF rs © (xy) 
Fantay "a 2 Cue) h A i araa 


The corresponding limits to o;/o, will naturally be the positive 
square-roots of these quantities. 


Case 3. Means and standard deviations all unknown 

We shall first consider methods of testing for the difference of the 
two means and of setting confidence limits to this difference. 

For the sake of simplicity, we shall assume that the two unknown 
standard deviations are equal. Now, if o denotes the common standard 
deviation, then 


(%;—%e) — Ha — He 
MEL 
i raars 
n, ng 


is a standard normal variable, while 
(m=1)s' P+ (m=! TAR (#1j—- sea E(x za)? 
o g? oo ’ 
which is the sum of two independent x*s, one with df=n—1 and 


486 FUNDAMENTALS OF STATISTICS 


the other with df==n,—1, is itself a x? with df=m+n,—2. Hence 
denoting by s’? the pooled variance of the two samples, so that 


gt (1) L ia (mg — 15’? 


m+ny—2 4 
we have 
(= 32) = (on Hs) 
(F3) — (mp) _ o(- + 
pot finls H (ml N 
int, C mF 


(17.12) 
a quantity of the form 2 where x? is independent of 
V8 (m2) P 
7 and has df=n,+n,—2. As such, the quantity on the left-hand 
side of (17.12) is distributed as a ¢ with df=n,-+n,—2. 
A test for Hy : pı, —p.=6, is then given by the statistic 
i=) — Fo 
s ait 


Ny Ne 


(called Fisher’s t), with df=n,+n,—2. For acceptance or rejection 
of the hypothesis Hy, one will have to compare the computed value 
of t with the appropriate tabulated value, keeping in view the 
alternative hypothesis. 

Following the usual procedure, it can be found that the confidence 
limits to p, — p, are 


nfl, Lyi 
(FiF) ta, n pige XS (-+;) 


Vfl. lyse 
and (Fiia) ther, n tng- XS (+5) 


Obviously, in both cases we are using s’? as the estimate of the 
common variance o*. 

Next, consider the problem of testing a hypothesis regarding the 
ratio o/o, or the problem of setting confidence limits to the ratio. 
The difference between this problem and the corresponding problem 
mentioned in Case 2 may be noted. Since p, and p, are unknown 
in the present case, they are replaced by their estimates, #, and %,, 


EXAOT TESTS AND CONFIDENCE INTERVALS 487 
and we use the fact that 


(zya) Da" ya ay? 
Beamer Fe ot 
is an F with (n,—1, ng—2) degrees of freedom. For testing Hy : 
o,/og=€,, we still use the F statistic, but now 


patty L vee (17.13) 
Sg 


with (n,—1, n— 1) degrees of freedom. 
The confidence limits to o;*/o,* will now be 
1 Sy 


s and F oe 
oS he ee alti Mgnt, ny1 X Fa" 
algin- ny2 Se S's 


The corresponding limits to o,/c, will be the positive square-roots of 
these quantities. 


17.6a The Fisher-Behrens problem 

For the case when the means (p, pẹ) and variances (04°, 03°) 
of two univariate normal distributions are all unknown, we have 
considered in Section 17.6 the problem of testing a hypothesis 
regarding, or that of having an interval estimate for, pı— ps under 
the assumption that o,? and o,*, though unknown individually, are 
known to be equal. Theoretical investigations in this area indicate 
that the procedures suggested in that section, based on taking 


(Za) — (41 — He) 
+ l 1 
tug 


as a t-statistic with df=n,+n,—2, may be applied even when o,? and 
o are unequal, provided the ratio o,*/o,° does not differ from unity 
by more than 0-4. In the hypothesis-testing situation, for example, 
the level of significance will remain virtually unaffected. 

We shall in the present section devote attention to the ‘situation 
where the assumption of homoscedasticity (i.e. equality of dispersion) 
is untenable, requiring the use of alternative procedures. Various 


488 FUNDAMENTALS OF STATISTIOS 


solutions have been offered for this problem (called the Fisher-Behrens 
problem), but none of them can be said to be wholly satisfactory. 
We give here a brief account of the major work done in this area. 
Although we shall specifically have in view the hypothesis-testing 
problem, the reader should be able to readily adapt the solutions 
to the task of interval estimation. 


I. The fiducial approach 

Intuitively, it would seem that the test of a hypothesis regarding 
Ho : pı —a=£o, or a family of confidence intervals for p, —pe, should 
be based on the function 


y= 7) — (t= pe) ve (17.14) 
mh fg 


in line with, say, (17.10). 

Fishers’ procedure is based on this statistic, but in attacking the 
problem he uses the controversial fiducial approach. In this approach, 
the unknown parameters are themselves looked upon as random 
variables and are assigned a probability distribution in the light of 
the available sample observations, Since 


V ni(i pi) lsi 
is, for each i, a t-statistic, one may write 
u=t, cosp—t, Siny, eee (17.15) 


2 P, 
where tany= e], 


For given y, the fiducial distribution of u has been derived. 
Although the distribution does not have a simple form, its percen- 
tage points have been given in Tables VI, VI, and VI, of Statistical 
Tables by Fisher and Yates. These may be used in testing for or 
setting limits (fiducial limits) to p;— pg. 

We do not recommend this approach, however, for the simple 


reason that fiducial probability cannot be looked upon as long-run 
relative frequency. 


EXAOT TESTS AND CONFIDENOR INTERVALS 489 


II. Neyman’s approach and Scheffé’s approach 

Suppose, with no loss of generality, that m<n;- Neyman 
suggestion is to pair off the observations x,; (j=l, AEN +m) of 
the first sample with n, of the observations in the second sample at 
random and then to make a paired t-test (in the manner of Section 
17.7) with dfen,—1. Supposing x; is paired with xp; where xgj 
is one of the values x91, ¥ggs ++-+++> ¥en. if we put 


z= >x (J= 2, eee 3M), see (17.16) 
the test for Hy : p—Ha=Ẹo will be based on the statistic 
(260) Ving ve (17.17) 
ae 
with z= S zim, and += piy 
j=1 m—1F 


which is, under Hp, distributed as a t with df=n— 1, 

Scheffé’s approach is a variant of Neyman’s in the sense that 
while he too suggests a pairing off of the n, observations in the first 
sample with n, of the second, his procedure is based on the 
quantities 


pick ELIMIN 
dpm ty 8 atg b È tj 


nyng j= Ny j=1 
ek ac N ve (17,18) 
Since dj are normally distributed, with 
E(dj) =11—#2) for all j, ws (17.18) 
var (d;)=042-+-(ny-+1)o4°/ngs for all j, ... (17.18b) 
and cov (d; dp) =0, for all j, j'(j#j')» ... (17.18c) 
H, is now tested by means of the statistic 
(d—£) V msu ve (17.19) 
with Jo Baym, aa e $ (dA), 
i=1 ny—1 i214 


which again is distributed, under Ho, as a t with df=n,—1. 
Scheffé has shown that among all tests based on the t distribution 
with df=n—1, his test is the most powerful. 


490 FUNDAMENTALS OF STATISTIOS 


However, both Neyman’s and Scheffé’s procedures are logically 
defective in that the acceptance or rejection of the hypothesis 
(or the interval estimate obtained) is determined not only by the 
data but also by the rather extraneous manner in which the observa- 
tions in one sample are paired off with observations in the other. If 
nn, Neyman’s approach also requires that some observations 
in the bigger sample be discarded. 


III. Approach of Smith, Satterthwaite, Dixon and Marsey 
This approach hinges on the approximation of a linear combina- 
tion of independent x*s with positive coefficients by a multiple of 


a x? with appropriate df in such a way that both the mean and 
the variance remain unchanged. 


Hence if we make use of the function u of (17.14) and write it as, 
approximately, 


r| Vex, 
where 
7={(%,—%)— (=a) mieit 
ni Ng 
and cy*/v approximates 


(i+) z [r+ st) = (ot t+ 


ny Re o; 
acy” wa x We +e 
== (cX + exa] (2 U 


(say), X,2 and y, having (n,—1) and (ng—1) degrees of freedom, 
respectively, then we have, on taking expectations, 


c= {ey (ny—1) +05(my— f(n) 
1 


. (17.20a) 
and, on taking variances, 


Ram- 1) 1)] [e+ 


or y= ata) lae ay ten ves (17.20b) 


BXAOT TESTS AND CONFIDENOR INTERVALS 491 


Consequently, the function u is taken to be distributed as, 
approximately, a t-statistic with 


g= (# +8) estan) ve (17.21a) 


Smith and Satterthwaite recommends the use of this procedure in 
testing for, or setting confidence limits to, the difference pı— pe 

Welch and also Dixon and Massey have shown that it would be 
more correct to use for the df of the approximating t-statistic the 
formula 

se, sey? si he 
es ae, loim ram) E paige 
IV. Welch's later approach 

In a later piece of research, Welch looked for a function 

A(s’, s'3, ny mq, &) such that 

Plu > Als’? s$, ms Me: a)}=a. sat AIR) 
Welch has found a series expansion for h in terms of 

cot] (G44), v=n— l, vg=n— l and Ta, 

the upper æ-point of the standard normal distribution. Following 
further work by Welch and others, h has been tabulated as a function 
of v, v, and c, for a=0-05, 0-025, 0-01 and 0:005. These tabulated 
values will be found in Table 11 of Biometrita Tables, Vol I. 

Of the four approaches outlined above, Welch’s later approach 
seems to be the best. In practice, one should make a preliminary test 
for Hy, 10,2=0,"; in case Ho is accepted, one should make the test 
for Hy : p»—Ha =f by means of Fisher’s t-statistic (17.12), and in 
case Hy, is rejected one should instead test H, by means of one of 
the above-mentioned procedures (preferably Welch’s procedure). 
But here also one would do well to remember what was said in the 
first paragraph of this section. 

The problems that generally arise in relation to one univariate 
normal distribution and two univariate normal distributions have 
been discussed above. The more general case of k univariate 
normal distributions will be briefly discussed in the next chapter. 


492 FUNDAMENTALS OF STATISTICS 


Example 17.7 The following data are the lives in hours of 2 
batches of electric lamps. Test whether there is a significant 
difference between the batches in respect of average length of life. 


Batch 1 Batch 2 y 
1,505 1,799 
1,556 1,618 
1,801 1,604 
1,629 1,655 
1,644 1,708 
1,607 1,675 
1,825 1,728 
1,748 


Let us denote by p, and p, the average lives for lamps in the 
populations corresponding to Batch 1 and Batch 2, respectively. We 
have to test the null hypothesis Hp : p= p; against the alternative 
H : py Fp 

We assume (a) that in, each population the life of bulb is 
normally cistribuied, (b) that the unknown standard deviations of 
the two dictributions are equal and (c) that the sample observations 
in each set constitute a random sample, one set being independent 
of the other. The test for Hy is then provided by the statistic 


which is a ¢ with df=n,+n,—2. 
For the given samples, 


Fu=515, Zu=587, 
Ju; =128,717 and Fon 76,689 
(putting u=x— 1,600), and 


z,=1,600+515/8 £,= 1,600-4-587/8 
= 1,664375, =1,683-857, 


BXAOT TESTS AND CONFIDENOR INTERVALS 493 


j aie + (Paat—mis') y 12 


nı +ng—2 


G f 126,717 —8 x (64:375)*}-+{76,639—7 x (83:857 a" 


120,978-900) 1/2 
= (Reset) =4 /9,306:069= 96- 468, 


so that 
_166+375— 1683-857 _,__ 19-482 
96-468 V (4+4) 96:468 x05175 
= —0:390. 


Since t.go5,19= 2° 160 and t.o95,13=9'012, the observed / leads to the 
acceptance of H,. In other words, the observed difference in mean 
life is found to be insignificant. 

Example 178 Two experimeters, A and B, take repeated 
measurements on the length of a copper wire. On the basis of the 
data obtained by them, which are given below, test whether B’s 
measurements are more accurate then A’s. (It may be supposed 
that the readings taken by both are unbiassed.) 


A’s measurements B’s measurements 
(in mm.) (in mm.) 
12:47 12°44 12-06 12:24 
11-90 12°13 12-23 12-46 
12:77 11°86 12-46 12:39 
11-96 12-25 11-98 
12-78 12-29 12-22 


Since the readings of both the experimenters are unbiassed, 
B’s measurements may be considered more accurate if they have a 
small population standard deviation than 4’s measurements, The 
null hypothesis is then 

Hg: 01 =0y 
to be tested against the alternative H : o,>0,- 
Under the usual assumptions, the test is given by 


Fath, with (n,—1, n— 1) degrees of freedom. 
a 


494 FUNDAMENTALS OF STATISTICS 


Here 
i a 1 <= 2 
s P= Oak niži } 
2 
1 (Zu 
= PAW \ halls EA Ie 
(putna) fpr E) 
(Zuai)*, 
TE er N 
and f= (Zn; i |p 
where we take u=x— 12-00. 
For the given samples. 
Yj =2'85, Lusj=2'14, 
j i 


Fu? =18105, Fg 07962, 
j 
LET PART a a SOARE 
sv=af! 8105 w) g{18105—0:8122} 
l =0-1109 
ee Oy lage 
and | sa =zf0 7962 a 3{0-7962—0'5724) 


=0:03197. 
Hence 
01109 _ 4. A Aa 
F=5031977> 469, with df: (9, 7). 
The tabulated values are 


F os; 9,7=3°68 and F.o: 9,=6:72. 


The observed F is thus insignificant at both the levels and He, 
therefore, should be accepted. Thus we find no reason to suppose 


that B’s measurements are more accurate than A’s. 


Example 17.9 The data below relate to per cent shrinkage in a 
nylon fibre as a result of temperature tests at 125°C and 156°C. 
(In the manufacture of the fibre, the material while still in the form 
of a continuous flow is subjected to high temperature with a view 
to improving its shrinkage properties.) Test whether the shrinkage 


at 150°C is greater that at 125°C, 


EXACT TESTS AND CONFIDENOE INTERVALS 495 


Per cent shrinkage Per cent shrinkage 
at 125°C at 150°C 

3-48 3-49 3-88 3°45 
3:58 361 3:72 406 ` 
3°54 3-67 3-96 3°65 
3°61 3-62 401 4:08 
3:57 3-69 3-59 

3°65 3-50 3-81 


We shall assume that the per cent shrinkage (x) is normally 
distributed in the population of readings for temperature level and 
that the two given sets of readings are independent random samples 
from the two distributions, say V(p4, 0,2) and N(p, 02”). 

Our problem is to test the hypothesis H, : pı=p against the 
alternative H : pı < po. 


Here we have, taking u=100(x—3), 
ny=12, n= 10, 
Z uj=701, 2 m,;= 821, 
j 
J uj?=41495, Zuj’=71537, 
j 


so that 
ü =58-4167, i, =82-1, 


4 t= } (41495 — 12x (58-4167)% | 
=r (41495 —40950-1301) 
=514-8699/11 =49:5336 
and s,#=4(71537—10x (821)3} 
m an 537—67404-1) 


=4132-9/9=459-2111. 


Since the two sample variances are widely different (pointing to 
a real difference between o,? and o,”), our problem is to be regarded 
as one of the Fisher-Behrens kind. 


496 FUNDAMENTALS OF STATISTIOS 


We have, for our data, 
i,t _  58:4167—82:1 
es SS SS EDT, 
ge, se 4 /40°5996 89201 
n Ms 12 10 
E 23-6833 
9/ 41278 +-45°9211 


= —23:6833/ V 50:0489 
= —23-6833]7-0745= —3'3477, 


aot, =4-1278/50-0489 
S yi [my +S e |ne 


=0°825. 


From Table 11 of Biometika Tables, Vol. 1, the lower 5% and 
lower 1% values of u are found to be —1-756 and —2-615. Since 
our value is smaller than both, the difference between the two sample 
means is to be regarded as significant ; i.e., the null hypothesis 
Ho : p= pe is to be rejected in the light of the data. 


while 


17.7 Problems relating to a bivariate normal distribution 

Suppose in a given population the variables x and y are distributed 
in the bivariate normal form with means py and p,, standard 
deviations o, and a, and correlation coefficient p. Let (xp 91) 
(tas Ja)» eeo (ax Jn) be a random sample of size n drawn from 
this distribution. We shall assume that all the parameters are 
unknown. 

(a) Test for correlation coefficient 

Here the sample correlation coefficient is 


: Zea) la- 

P rT T PEPI ET 
(zeam zaa 
where z and j are the sample means. When p=0, the sampling 


distribution of r assumesa simple form (vide Exercise 17.6) and in 


that case 
rVn—alVi-# yes) (17.23) 


EXACT TESTS AND OCONSIDENOE INTERVALS 497 


can be shown to be distributed as a ¢ with df=n—2. This fact 
provides us with a test for Hj: p=0. As to the general hypothesis 
Ho : p=po, an exact test becomes difficult, because for p#0 the 
sample correlation has a complicated sampling distribution. An 
approximate test, which may be used for even moderately large n, 
will be given in Chapter 19. 


Example 17.10 The correlation coefficient between nasal length 
and stature for a group of 20 Indian adult males was found to be 
0-203. Test whether there is any correlation between the characters 
in the population. 

The null hypothesis here is Hy: p=0, to be tested against all 
alternatives. As we have seen, under certain assumptions, which may 
be considered legitimate here, the test is given by 

=r Va 2 V T= 
which has df=n—2. 
Here df=20—2=18 
0-203V18 _ 0:203x4:243 _ 0:8613 
and (= = Sees =0°880. 
Vi— (0203)? V1—0-041209 0:9792 
The tabulated values are 
tops, 18= 2'101 and torr = 2878. 


The observed value is, therefore, insignificant at both the levels ; 
i.e., the population correlation may be supposed to be zero. 

(b) Problems regarding the difference between pex and py 

Information regarding the difference between the means, ps and 
Hy, may be of some importance when x and y are variables measured 
in the same units. 

To begin with, we note that if we take a new variable, 


z=% 
then this z, being a linear function of normal variables, is itself 
normally distributed with mean 
Be=Ha—Py 
and variance 
o? =0} +03 —2p0 y Oy: 
rs (1)—32 


498 FUNDAMENTALS OF STATISTICS 


It will follow, from what we have said in Section 17.5, that 
if we put 4=x;—y» Z= DZiln and ‘t= i Z (z —ž)*, then 
i a 


Vi(z—pe)/s's we (17 24) 
will be distributed as a ¢ with df=n—1. This will provide us with a 
test for Hy : px — Hy =o: which is equivalent to Hy: p= bo and with 
confidence limits to the difference p,=p,—Hy- The statistic (17.24) 
is often referred to as a paired t. s 
We may, instead, be interested in the ratio slu, =n (say). In 
this case, we shall take 
i Z=% —N}, 
which again is normally distributed with mean 
r= Ps — Py =0. 
Hence the statistic 
t=ẸvVn ž|s 
is distributed as a t (i.e. paired t) with df=n—1. This can be used 
for testing the hypothesis Hy: px [uy=m oF for setting confidence 
limits to the ratio p,/p,- 
Example 17.11 The weights of ten boys before they are subjected 
to a change of diet and after a lapse of six months are recorded 


below : 
Weight (in Ib.) 


Serial No. Before After 
1 109 115 
2 112 120 
3 98 99 
4 114 117 
5 102 105 
6 97 98 
7 88 91 
8 101 99 
9 89 93 

10 91 89 


Test whether there has been any significant’gain in weight as a 
result of the change of diet. 


EXAOT TESTS AND CONFIDBNOE INTERVALS 499 


If we denote by y and x the weight of a boy before and after the 
change of diet, then the hypothesis to be tested is Hy : 4,=/y, the 
alternative being H: p, > py 

Under the assumptions (a) that x and y are jointly normally 
distributed and (b) that the pairs of values of x and y form a random 
the sample from the bivariate normal distribution, the test for Ho is 
given by 

t=Vni|s, 
with df=n—1, where z=x—y- 

For the given sample, 
the values of z are : 


6 Hence z7=25/10=2°5, 
8 753-10 (2.5)2 
, _. [8310x (2.5) 
4 y= s 
i 0S V 100556=3-171 
: and teVJ0xX25_3:162x25 
“a aN ee 


at =2:493. i 
Now, f-o5,9= 1'833 and to o=2821. The observed value is thus 
significant at the 5% but insignificant at the 1% level of significance. 
If we choose the 5% level, then the null hypothesis should be 
rejected and we should say that the change of diet results in a gain 
in average weight. 
(c) Problems regarding the ratio o,/a, x 
When x and y are variables measured in identical units, one may 
also be interested in the ratio ¢,/o,. Let us denote this ratio by £. 
If we consider the new variables 
“u=x4+ y 
and v=x—€y, 
then s and v are jointly rormally distributed, like x and y, and 
cov(u, v) =02—f0} =0. 
‘Thus u and v are uncorrelated normal variables. 
In going to test for the hypothesis 
Hy: on|0y=fo 


500 FUNDAMENTALS OF STATISTIOS 


we shall, therefore, take two new variables, 
u=x+ Ey 
and v=x — $o}, 
and shall instead test for the equivalent hypothesis Hy : py,=0. 
This test will be given by the statistic 
tæru Vn- -2V I, Seth? 25) 
with df=n—2, ru, being the sample correlation between u and v. 
To have confidence limits for £, we utilise the fact that, with 


u=x+éy 
v=x—fy, 
tus| Vn—2 ASE oe 
Pee Stern, n-a] l—a 
or Pg) =) <t8ip, a-]=1—<. 
By solving the equation 
ri,(n—2)=#3)9, n-2(1—r,) 

or, say, #()=0 


for the unknown ratio f=o, Jo,, two roots will be obtained. In case 
the roots, say ¢; and £,, are real (£; < £), these will be the required 
confidence limits for £ with confidence coefficient 1—a. 

Again, (£) may be either a convex or a concave\function, In 
the former case we shall say ¿< ê< és while in the latter we shall 
say OSE KÉL orf, S E< 

But the roots may as well be imaginary, in which case we shall 
say that for the given sample 100(1—a)% confidence limits do 
not exist. 


17.8 Problems relating to a simple regression equation 

Let x and y be two variables such that the distribution of y for 
each value of x is normal, with mean 

x=a+Bx (say) 

—which means that the population regression of y on x is linear— 
and a constant variance, o*. Note that here y must be a random 
variable but x may be either random or non-random. In the former 
case, the distribution of » referred to above is the conditional 
distribution of y, given x. 


EXACT TESTS AND CONFIDENCE INTERVALS 501 


First, consider the case where y is random but not x. The fixed 
values of x may be denoted by x;, xp) «+--+» Xp and let yy, Jp sess In 
be, respectively, independent random samples of size one each from 
the distributions of y corresponding to xi, Xe «+. Xa: It will be 
assumed that Zia)? >0andn>2., 


The estimated (least-square) regression line will be given by 


=4-+bx, 
where b=3 (a)na) 
=Z(*1—4) Jil Bl #)? i 
and a=j—bz, 
with 3= 3 xn, J=Zyiln- 


In order to study the sampling distribution of b, note that b may 
be looked upon as a linear function of y; : 


b= 2 Wi Ji 


where wy sen 


Being a linear function of y, which are normal variables, b is 
itself a normal variable ; and 


E(b) =2 i; E( 5) Suila +Bx;) 
a al RD omh one (17.26) 


[since Zw=0 


Za) ; 
and i e) =l), 


' and var(b) =} w? var(»;) [the »’s being independent] 
o 
=o gwt = e, ws (17:27) 


where Sh D(a)" 


502 FUNDAMENTALS OF STATISTIC 


Similarly, a, which is a linear function of the normal variables Ji is 
itself normally distributed, with 


E(a) =E( }) —E(b)?=a+ px—P2=a 


and var(a) =var( §)—2zcov( J, b)-+2*var(b). ... (17.28) 
Now, 
è var( j) =o%/n, 
var(b)=0°/S; x; 
while : 
cov( J, b)=icov(Z yn Zwi) 
=i Zw vari y) 
= Zw=0, 
so that 
var(a)—=o'f' 4, ws (17,29) 


Again, for any given x, the estimated value (Y=) of y, being a 
linear function of a and 6, is normal, with 
E(Y,)=E(a)+E() x=0+px=n, + (17.30) 
and var(Y,)=var{ j-+-b(x—z)}=var( 9) + (x—3)*var(6) 


ot, (x—#)%o8 Oar aes 
et ee oth ee ws (17.31) 


Hence if a? be known, tests and confidence limits for a and f can 
be obtained considering 


TI is ves (17.32) 
n Sup 
Gets} 
and (b—B) Vy 


to be normal deviates. Similarly, to test whether, corresponding to 
a given value of x, , has a specified value or to set confidence limits 


(17.33) 


EXACT TESTS AND CONFIDENCE INTERVALS 503 


to 7,, we can use the statistic 


2 AEH e 17 34 
I, (x4) ss (17.34) 
3+ 


as a standard normal variable. 
In case o? is unknown, one would use its unbiassed estimator* 


53-5 =2( yi— ¥4)*/(n—2) = (Zaz — bd x; 4)*| (n—2) 


=(Syy—bS,,)/(n—2). see (17.35) 
In order to test the hypothesis H, : a= (a) being the value of œ 
as specified by the hypothesis), one will then use as the test statistic 


a—a 

a? “+ (17,36) 

ee) ; 

which is, under Hy, distributed as a ¢ with df=n—2. The decision 
as to the acceptance or rejection of Hy will have to be taken by 
comparing the observed value of (17.36) with the appropriate 


tabulated value of ¢ with df=n—2. 
In order to test the hypothesis H, : B=Bo, one will similarly use 


as the test statistic 


(b— Bo) V Sex|. s+ (17.87) 


which, again, is distributed as a ¢ with df=n—2. 
Similarly, to test the hypothesis Hy : 7x=" (viz. that the mean 
of y for a given value of x has a specified value), one will use as the 


test statistic r 
fe s+ (17.38) 


which, too, is distributed as at with df=n—2. 
To set confidence limits to æ, one may start from the result that 


P|-tn, ad Uae ss] 


srala Ei 


* Sy is analogous to Sxy and equals pi oe) 


=l-y, 


. 


504 FUNDAMENTALS OF STATISTICS 


whatever the true values of the parameters «œ, 8 and o? may be, 
But this means that 


P Ea . ; ee 
Pla—tyjs, n-e sayiti a Sat bei, nao saits ] 
an ax 


=l—y. 
As such, the observed values of 


l A WE a 
OF trig, n-o sapit 
ae 


will serve as confidence limits to « with confidence coefficient | —y. 
` „To set confidence limits to £, one may likewise start from the 
result that 


Pl tars, seb VSe Shn, ~|=! Si 


whatever the true values of the parameters may be. This will 
mean that the observed values of 


> OF lye, nna Sya] V Sex 
will serve as confidence limits to 8 with confidence coefficient 1 —y. 


To set confidence limits to 7,, the mean òf J for a given x, one 
should start from the result that 


Pois 
P[—tys, n-2S Im Stuiain-e|=!—y, 
F; l (x—7 
74/ -+ 
beta fa 


whatever the true values of the parameters may be. Hence the 
observed values of è 
T Qai 


bg Flys, BARVA K 


will be confidence limits to "x with confidence coefficient l—y. 

By varying x over a suitable set of values, one will get a series of 

lower confidence limits and a series of upper confidence limits, 

Plotting these limits against the chosen values of x, one may draw a. 
curve through the lower limits and one through the upper limits, 

The enclosed region is called (the 100%) confidence band for the mean 

of y. This may be used to determine the confidence limits to the 

mean of y for any specified value of x. 


EXAOT TESTS AND CONFIDENO# INTERVALS 505 


In this connection, one may also consider the problem of 
predicting, by giving suitable limits based on the regression line, the 
value of y corresponding to a given value of x. For given “g—Y, isa 
normal variable, with 


E(.»—Y.4)=1x—12=9 wee’ (17.39) 

and var( y— Y,)=var(y)-+var(¥,) 

a! 1, (x—%)* 
=o i++ =| we (17.40) 

so that 

ere ~, 1, wa)? me ee) 
P[Y. -nno 145+ See are Fa Hry LHE] 
=l-y. > 


Hence if o be known, then the prediction limits to y (with . 
«confidence coefficient” 1 —y) will be 
n Gee =) 
Y. Fry 04/14. HEE . 
In case o is unknown, it will be replaced by s, . and Ty, will be 
replaced by tyro, n-- 
We may now turn to the case where x as well as y is random. 
The procedures outlined above will remain valid here also, Take, 
for instance, the sampling distribution of 


(bp) V Ses 
Simiad eE 
$ Jae 
For given xy, žp +++ „Xp, the conditional distribution of this is a 
t-distribution with df=n—2. However, since a t-distribution has. 
the df as its only parameter (and not the fixed values of xy... ¥n)» 


the unconditional (or marginal) distribution of the above quantity 
will also be a ¢-distribution with df-=n--2. Hence the test for 
H, : =f, oF the confidence interval for £ as suggested in the case 
of non-random x will remain valid even in the present case. The 
same is true for the other tests and confidence intervals. 

Example 17.12 For 20 pairs of fathers and sons, the regression 
equation of height of son (y) on height of father (x), both measured 
in inches, was found to be 

Y =3.66-+ 0932x. 


506 FUNDAMENTALS OF STATISTIOS 


Test whether a differs significantly from zero and b differs signi- 
ficantly from unity. For the 20 pairs, =66-21, }(x;—z)*= 120-56 
i 
and >( »;—3)*=145-61. 
7 

For making the test, we assume that the population regression of 

y on x is linear : 
1 = 0+ Bx (say). 

We have then to test H,:a=0and H, : 8=1. We make the usual 


assumption regarding the conditional distribution of y for given x. 
Here 


n ROY) Syy FS a 

AAH ET E, 

g w= 1 45:61 — (0-932)? x 120-56 _ 145 61—104-72 2:272, ` 
Dery es Taree Ue Er N AA K 


Hence for testing H, : «=0, we have 
—0 4 
t=— F aq (with ¢f=18) 
salita 
a 3-66 
7 P 38376)! 
eaan t se) 
mn 806 eS 0:408 
(2272) 80-05-3636) 178  1-507x6034 7 
For testing H, : B=1, we have 
EVS (with df—18) 
3.: 
(0932-1 )V 120-56 _ _ 0-068 x 10-98 
1-507 1-507 
The table of the ¢-distribution gives 


=— 0:495, 


forse 2101 and toos yg==2'878. 
The observed value of each ¢ is thus insignificant at both the 1% 
and 5% significance levels. Thus the observed a does not differ 
significantly from 0, nor does the observed b differ significantly 


from 1. 


EXACT TESTS AND CONFIDENOB INTERVALS 507 - 


Example 17.13 With the data of Example 17.12, one may like to 
set 95% confidence limits to the conditional mean 7, of y for a given 
x, say for x=70 (inches). 

Here with y=0-05, we have 


— ee 


trig, n= spa tE 
an 


=2:101 x 1-507 0054379)" 


120°56 
=3+166(0-05 40-1191) "2 =3° 166 0°41 12=1-302. 
Also, Yy9=68°90. 
As such, the desired confidence limits are 
68:90— 1-30 =67-60 (inches) and 68-90-41-30 =70-20 (inches), ” 
with confidence coefficient 0°95. 

One may as well like to predict, in the light of these data, the 
height of a son when the height of the father is known to be 70 
inches. 

Assuming that the chosen confidence coefficient is 0:95, we now 
note that, for y=0-05, 

tyig, m=9 syag tHE 


ae 
=2:101 x 1°507{1+-0-05+ ti 
=3-166(1+0°05 +0°1 191) 1/2 = 3-166 x 1-081 = 3-422. 
One may then say, with 95% confidence, that for a father who 
is 70 inches tall, the height of an (adult) son will be between 
68-90—3-42=— 65-48 inches and 68:90-+- 3-42 = 72-32 inches. 


Questions and exercises 


17.1 Suggest some exact test procedures for hypotheses concern- 
ing the parameter p of a binomial distribution and the parameter A 
of a Poisson distribution. 

17.2 Describe how one may compare two sample proportions 
and test for association between two attributes in case the sample 


size is not large. 


508 FUNDAMENTALS OF STATISTIOS 


17.3 Why is it that, for data of somewhat similar types, we 
sometimes use Fisher’s t-test and at some other times the paired 
t-test? What type of test wcvld you use in case'the assumption of 
homoscedasticity underlying Fisher’s t-test is untenable ? 

17.4 What is the Behrens-Fisher problem? Describe the 
various solutions suggested for this problem. Which one would you 
recommend and why ? 

175 How do you test for, or set confidence limits to, the ratio 
of two variances ? Consider separately the case of two univariate 
normal distributions and that of a bivariate normal distribution. 

17.6 The sample correlation coefficient r, for sampling from a 
bivariate normal distribution with p=0, has the p.d.f. 


f= i bey (L= r)a, pes) | <r<l, 
Bly "> ) 

Show that the statistic r/n—2/ y T—r? has the t-distribution with 
df=n—2. 

Explain how this statistic can be used to test the hypothesis 
H,: p=90. 

17.7 When a simple regression equation y=a-+- fx is fitted to 
bivariate data, how woultl you test for hypotheses concerning « and 
B? If two such regression equations are available, how would you 
test that the population regression equations are (a) identical, 
(b) parallel ? 

17.8 For bivariate data, how does one test for, or set confidence 
limits to, the array mean of one variable (y) for a given value of 
the other (x) ? Describe also the method of predicting the value of 
» for a given value of x. 

17.9 Twenty-five guinea-pigs are suffering from a disease. In 
order to study the therapeutic value of a serum, it is administered 


to a randomly selected group of 15 guinea-pigs, the remaining 10 
being left untreated: The results are as follows : 


Treated Untreated 
Recovered 9 ; 
Died 6 7 


Would you consider the serum beneficial ? Partial ans. Under Hy 
P (this and more extreme tables) = 0-1442. 


BXAOT THSTS AND OONFIDENOE INTERVALS 509 


17.10 A 5-foot specimen of a new type of fibre is found to have 
13 defects, while the manufacturer claims that there are no more 
than 150 defects per 100 feet, Do the above data support this 
claim ? Partial ans. P[x>13|H)J]=0°0812. . 
17.11 The mean I. Q.. for a group of 25 children is 108 481, the 
standard deviation being 17-255. Test whether the observed mean 
is significantly greater than 100. Partial ans. t=2-458. 
17.12 The mean yield per plant for 11 tomato plants of a parti- 
cular variety was found to be 1,28473 gm. with a standard devia- 
tion of 96-41 gm.. Set up 99% confidence limits to the mean yield 
of all plants of this variety. Ans. 1,192°61 and 1,376°85 gm. 
17.13 For the data of Exercise 17.12, obtain 99% confidence 
limits to the population standard deviation of yield of plants. * 
Ans. 60-75 and 207-67 gm. 
17.14 The marks obtained by 20 students of College A and 15 
students of College B in a mathematics test are given below : ; 


College A College B 
BIAN | o" ATA Tao a aR 
76 84 8l 49 61 55). 90 
CS RL: ee UR E) 36 «681 nade 
69 88 43 80 A230 OP 
a5 52"... 0. - 42 T aa ie 


Do you think students of Col 
matics than students of College B ? 


lege A are more proficient in mathe- 


Partial ans. t=1:275. 


17.15 Fifteen bars of steel produced by Process I have mean 
breaking’ strength 46:2 with s.d. 8-7, while 12 bars produced by 
Process II have mean breaking strength 57:5 with s.d. 11:6. There 
is not enough ground to suppose that the population s.d.s are equal. 
Test whether the population means may be supposed to be equal. 

si*m —0: 
Pe 310. 

17.16 It is known that the mean diameters of rivets produced by 
two firms, I and IT, are practically the same ; but the standard devia- 
tions may differ. For 22 rivets produced by Firm I the standard devia- 
tion is 29 mm., while for 16 rivets manufactured by Firm II the 
standard deviation is 3°8 mm. Do you think that the product of Firm I 
is of a better quality than that of Firm II ? Partial ans. F=1:72. 


Partial ans. u=2-802, with 


510 FUNDAMENTALS OF STATISTICS 


17.17 The additional hours of sleep gained after using each of 
two drugs by the same group of 12 patients are given below : 


Patient _ Additional hours of sleep 
Drug 1 Drug 2 
1 2-1 3-6 
2 0-2 47 
3 0:9 1-8 
4 38 5-5 
5 35 46 
6 —0:2 —0:3 
7 —1:3 —0-4 
8 —0:3 1-9 
9 —1-7 2-0 
10 07 1-7 
11 —0:8 21 
12 1-3 1-1 


Test whether the second drug gives, on the average, at least an 
hour more of sleep than the first drug. Parttal ans. t=1-633. 
17.18 For the data of Exercise 17.17, judge whether the standard 
deviations of the two series are significantly different. 
Partial ans. |t| =0-328. 
17.19 The correlation coefficient between head-length and 
stature for a sample of 36 members of an Indian tribe has been 
found to be 0-4339. Is it reasonable to assume that in the popula- 
tion the characters are uncorrelated ? Partial ans. t=2-8(8, 
17.20. The age in years (x) and chest-girth in inches ( y) were 
recorded for two groups of school-boys consisting of 15 and 18 boys, 
respectively. On the basis of these data, the following values were 
obtained: ` 


Group 1 


Group 2 
pr 20271 24412 
z Ji 23:29 53-78 
Zat 2,742-562 3,314-014 
2 y? 44:775 174-407 
Li 314-921 729-604 


EXAOT TESTS AND OONFIDEBNOB INTERVALS 511 


Determine for each group the linear regression equation of y on x. 
Hence examine if the corresponding population regression equations 
(assumed linear) may be supposed to be identical oy parallel. 

Partial ans. Regression equations are: Y =0:778+-0°0573x 
and Y=2-002+0:0712x. For a, ay : t=0:132 ; 
for by, by: t=—0-018. 


SUGGESTED READING 


[1] Anderson, R. L. and Bancroft, T, A. Statistical Theory in Research 
(Chs. 7, 13). McGraw-Hill, 1952. 

{2] Dixon, W.J. and Messey, F. J. Introduction to Statistical Analysis 
(Chs, 6-8, 10, 11). McGraw-Hill, 1969, and Kogakusha, 

[3] Goulden, C. H. Methods of Statistical Analysis (Chs. 4, 6—8). 
Asia Pulis.ing House, 1959. 

[4] Hald, A. Statistical Theory with Engineering Applications (Chs. 
9—11, 18). John Wiley, 1952. 

[5] Hogg, R. V. and Craig, A. T. Introduction to Mathematical 
Statistics (Chs. 6, 9, 10). Macmillan, 1970, and Amerind. 

[6] Johnson, N. L. and Leone, F. C. Statistics and Experimental Design, 
Vol. I (Chs. 8, 12). John Wiley, 1964. 

[7] Keeping, E. S. Introduction to Statistical Inference (Chs. 8, 11). 
Van Nostrand, 1962, and Affiliated East-West Press. 

[8] Mood, A. M., Graybill, F, A. and Boes, D. C. Introduction to 
the Theory of Statistics (Chs. 8, 9). McGraw-Hill, 1974, and 
Kégakusha. 

[9] Walker, H. M. and Lev, J. Statistical Inference (Chs. 3, 4, 7—10). 
Holt, Rinehart & Winston, 1953, and Oxford & IBH, 1965. 


1 8 SOME FURTHER 
EXACT PROCEDURES 


18.1 Introduction 

In Chapter 17, we presented exact tests and confidence intervals 
in the context of sampling from one or two univariate normal 
distributions or from a bivariate normal distribution. In the present 
chapter, further tests and confidence intervals will be presented 
before the reader. The population distribution in some cases 
will be supposed to be multivariate normal. In some other cases, 
several (more than two) univariate normal distributions will be 
kept in view. The inference problem may relate to a multiple 
correlation coefficient or a partial correlation coefficient in a multi- 
variate normal distribution. Or it may relate to the parameters 
in a multiple regression equation. Alternatively, the problem may 
arise from the need to compare the means or the variances of k 
univariate normal distributions. 

In addition, we shall treat the problem of detecting outliers ir. a 
set of observations. The question here is to judge whether one 
or a few observations that are somewhat widely different from the 
other observations in a set may be regarded as drawn at random 
from the same distribution as the others. Here also the distribution 
or distributions from which the observations have been taken will be 
supposed to be univariate normal. A method of combining the 
results of several tests of a hypothesis will also be considered. 


Lastly, there will be given tests for judging the validity of the 
normality assumption, 


18.2 Comparison of means of more than two normal distribu- 
tions 


Suppose there are k populations in each of which the variable x 
is normally distributed. Let w; (i=1, 2, ...... , k) be the unknown 
mean of x in the ith population. We want to test whether the k 
-means may be supposed to be equal. Thus our null hypothesis is 
A, | yp gree = pks 
which is to be tested against all alternatives. 
512 


SOME FURTHER EXACT PROCEDURES 513 


The standard deviations, assumed unknown, will be supposed to 
be equal, the common value being denoted by o. 

Let independent random samples be taken from the distributions, 
the size of the sample from the ith distribution being n; (> 2 for 
at least one i), The observations from the ith distribution may be 
denoted by 

Kits Xip errr > Xin; . 


Let = 2 xy) Rjy 
the ith sample mean, Bs let 


_— 2 5 xij|n 
i=1 jsi 
the grand mean, where 
n= > ny 
i=1 ` 
Now, consider the sum of squares of the deviations of the obser- 
vations from the grand mean, to be called the ‘total SP : 


total ss— Z (x;;—2)*- ave (18.1) 
We can write : ; 
total SS =3, e—a +(xj—#)}¥ 


=5n(—2)"+ È ¥(«—#)?. tes (18.2) 
i=1 i=] jsi 


The first component represents the sum of (weighted) squares of 
deviations of the sample means fiom the grand mean, which may 
also be looked upon as representing the extent to which the sample 
means differ among themselves. This is called the ‘SS between 
groups’. (SSB). 

The second component, on the other hand, uses deviations of 
values within each sample from the sample mean and is called the 
«SS within groups’ (SSW). It can be shown that 


E(SSW)=(n—k)o*, * w (18.3) 
while E(SSB) = (k—1)o* + nimi p)’, ve (18.4) 
where p= $ nimla: 


xa(1)—33 


514 FUNDAMENTALS OF STATISTIOS 


If we put 3 
MSB=SSB|(k—1) dee fa{t8.5) 
and MSW=SSW](n—k), ss (18.6) 
to be called the ‘mean-square between groups’ and the ‘mean-square 
within groups’, respectively, then 


k 
E(MSB) soto S nlui) =o, say, 618.7) 
i=} 
and E(MSW’) =o? ... (18.8) 
k 
Also, $ (ui p)*=0 
i=1 


if, and only if, all the means p; are equal. Otherwise, it is positive. 
Hence our problem reduces to the problem of testing 
H : o,2=07 
against the alternative 
i H.:0 >00 
which is similar to the problem posed in Section 17.6 regarding the 
equality of two variances (or two standard deviations). 

The test is given by é 

MSB % 
E= ISW Hys Be 
which is, under Hy, distributed as an F-statistic with df= (k—1,n—k). 
We reject H, if 

F> Fa; (k-3), (n-k) 
and accept it otherwise, a being the chosen level of significance. 

The process of splitting the total SS into independent components 
like SSB and SSW, which can be attributed to different sources of 
variation, is called an analysis of variance, and is generally put ina 
tabular form. 

Computational procedure for the analysis of variance : 


(1) Calculate the total foreach group: Ty, Taos ses a5 Tko 
i] 
where Tin=> xij 
io 2 ij 


(2) Calculate the grand total : Poo= 22 t= Tio 
ij 
(3) Calculate the raw total SS: EEx 
ii 


SOME EURTHER EXACT PROCEDURES 515 
(4) Calculate 5 Ti fn; 
i 


(5) Calculate correction factor: 74,*/n. 
(6) Total SS=2E x;?—Tyo%/n=value obtained in step (3)—that 
j 


obtained in step (5). 
(7) SSB= J Tig'm;—Togt/n=value obtained in step (4)—that 


obtained in step (5). 

(8) SSW=total/ S§—SSB=value obtained in step (6)—that ob- 
tained in step (7). J 

It may be noted that sometimes calculations may be simplified 
by making a change of base and scale for the observations, This will 
not affect the test, for the F statistic defined by (18.9) remains 
unaltered under such transformations. 

Example 18.1 The weights in gm. of a number of copper wires, 
each of length 1 metre, are obtained. These are shown below 
classified according to the dies from which the wires come : 


Die No. 

I II II IV =e AY 
1:30 1-28 1°32 1°31 1:30 
1:32 1°35 1:29 1:29 132 
1:36 1:33 1°31 1:33 1:30 
1:35 1°34 1:28 1:31 1:33 
1°32 1:33 1°32 
1°37 1:30 


Test the hypothesis that there is no difference between the mean 
weights of wires coming from the different dies. 

To test the hypothesis, we shall assume that the distribution of 
weight of wire (x) is normal for each die and that the variances 
for different dies are equal. 

Let x, be the weight of the jth copper wire coming from the 
ith die. Taking the origin at 1-28 gm. and the unit as 0-01 gm., our 
new variable is 


u= 100(—1-28), 


516 FUNDAMENTALS OF STATISTICS 


and so its value for the jth wire from the ith die is 
ujj= 100(x,;— 1-28). 
The values u;; are shown in Table 18.1. 
TABLE 18.1 


WaicHts OF ‘Correr. WIRES AFTER CHANGE OF 


Basz AND SOALE 


Die No. 
I rte lI IV y 
2 0 4 3 2 
4 7 1 1 4 
8 5 3 5 2 
7 6 0 3 5 
4 5 4 
9 2 
Total 34. 18 15 16 13 
Here To =Z Zu;=96, 
ij 
Z Zu; =504, 
T;3__343 18? 153, 16%, 13? 
yaa iat ON bie Ba Y 


= 192-6667 + 81 -+37-54512444225 


= 404-6167 


and correction factor = Teh °° 


gg = 908-64. 


Hence 
total SS= 5 Puy?—To8/n 


=504 —368-64 = 135-36, 
SSB (i.e. SS due to dies) =T Ton 


= 404-6167 —368-64 


3 =35:9767 
and SSW=total SS—SSB 
== 135-36 — 35-9767 =99-3833. 


SOME FURTHER EXAOT PROCEDURES 517 


We are now in a position to draw up the analysis of variance 
table (Table 18.2). 


TABLE 18.2 
ANALYSIS OF VARIANOE FOR THR Data oN WEIGHTS OF 
Correr WIRES 


Source of af | F at level 

variation l 1% 5% 
Between groups 4 35:9767 8:9942 ok £43 287 
Within groups 20 99:3833 + 9692 | 


Total | 24 1353600 bee = 


The observed F, being less than F.os; 4,99 and rab 4.2 18 insigni- 
ficant at both levels of significance. Thus the hypothesis under test 
should be accepted ; i.e., there seems to be no reason to suppose 
that (population) mean weights of copper wires for different dies 
are unequal. 


18.3 Comparison of variances of more than two normal 
distributions 

Suppose, as in the previous section, that there are & populations 
in each of which the variable x is normally distributed and from 
which k independent random samples of sizes n;>2 (i=1, 2, ...... ath)" 
have been taken, We want to test the hypothesis of equality of 
variances (or of standard deviations), i.e, the hypothesis 

Ag: 0j =0; = =gh 

The means as well as the standard deviations of the populations 
will be assumed to be unknown. : 

Let s;* be the variance of the ith sample, so that 


ry i 
=] PA 3 W (18.10) 
and let 
-g 
s= a a) + (18.10a) 
imi 
The usual test for H, is based on the likelihood-ratio principle 


518 FUNDAMENTALS OF STATISTIOS - 


and has as its critical region 
À S Àas 
where ; 
ja Weighted geometric mean of sample variances 


weighted arithmetic mean of sample variances 


k 
(fon [Be we (48.11) 
isi n 
with n= Fn, and À, is such that ; 
PALSA. |H]=a: we (18-12) 


Bartlett’s test statistic 4 is a modification of the above, where s? is 
replaced by s‘;2 and n; by y;=n,—1. Indeed, he suggests as the test 
criterion —» In p= M, so that 


Mavin( Zisa h) Evi Ins, z. (18.18) 
=1 i=1 
where none Provided none of the numbers v; is small, Mis 
distributed, under H,, as approximately a x? with df=k—1. 
For small samples, Bartlett has shown that 
M'=M|{1+¢,/3(k—1)}, i. (18,14) 


where a=}; s. (18.14a) 


follows, under H,, the y? distribution with df=k—1 more closely 
than M. 

The percentage points of M can be found in Biometrika Tables, 
Vol, I (Table 32). 


Example 18.2 Five varieties of tomato are to be compared in 
respect of variation in yield from plant to plant. Plants for these 
varieties were grown under similar conditions. The number of 
plants and the observed variance of yield (in gm.) for each variety 
are shown below : 


Variety No. 1 2 3 4 5 
Number ofplants 18 21 28 22 20 
Observed variance 


(with dfas divisor) 9812 8059 8523 6917 7021 


SOME FURTHER EXACT PROCEDURES 519 


We assume that for each variety the population distribution of 
yield per plant is normal and that the n; yields for plants of the ith 
variety (i=1, 2,...... , 5) are a random sample from this distribution. 

The following table is meant to facilitate the computations. 


TABLE 18,3 
COMPUTATIONS For Testine Equatity oF VARIANORS 


Sample 
variance af logs’? vis? vlogs’? Ay 
si Yi 
eet Sei SO TS a E 

9812 17 3:99176 166804 67°85992 0°05882 
8059 20 3-90628 161180 78:12560 005000 
8523 28 3°93059 238644 110°05652 0°03571 
6917 21 3°83992 145257 8063832 004762 
7021 19 3°84640 133399 73°08160 0°05263 
Total: i 105 — 845284 409:76196 024478 


From the totals in the table, we have, since In 10=2-30259 and 
1/105=0-00952, 
M =2-30259[ 105 < log(845284/105) — 409-76196] 
=2-30259(410-11040—409-76196) 
=0-8023 
and ¢,= 024478 —0-00952=0-23526. 
Since k—1=4, we also have 
m'=m]{i +. 2235761 _0-8028/1-0196-=0-7869. 
On comparing this with X?.95,4= 9°49 and X%g,4=13'28, we 
conclude that the sample variances do not differ significantly (i.e. do 
- not indicate any real variation among the population variances). 


18.4 Tests for multiple and partial correlation coefficients 
Many problems regarding the-joint distribution of more than two 
variables may also be solved by using the four basic distributions 
considered at the beginning of this chapter. 
We shall consider below the problems of testing for a multiple 
correlation coefficient and for a partial correlation coefficient. Let 


520 FUNDAMENTALS OF STATISTICS 


Xis Xay seen »*, be p variables whose joint distribution is of the 
-variate normal form. Let Piras...p ANd pya-94...p be the population 
multiple correlation of x; on xy foe sin >*, and the population 
Partial correlation-of x, and x, eliminating the effects of xg, x4,°...--- , 
xp. Further, suppose Tireaup ANd ryp-94..) are the corresponding 
sample coefficients based on a random sample of size n(>p--1) from 
the p-variate distribution. ; 
It can be shown that if P1-23...p =0, then the statistic 
_TH-2a..p/(P—1)_ . (18.15) 
(l—risg...9)/(n—p) . 
is distributed as an F with df-=( p—1, n—p). This statistic, therefore, 
supplies a test for Hj : Piai.: p =0. e : 
Similarly, 
Massy Vn F ... (18.16) 
vi =r iru) aria 
which. is a ¢ with df=n—p when piss4..p=0, supplies a test for 
Ag: Pis 94...p=0. } 

Example 18.3 Sixty students are examined in statistics, physics 
and mathematics. The total correlations between the scores obtained 
are - 

: Ty9=0°64, +,5=0-75 and r,,—0-82 f 

(where x,, x, and x, are taken to denote scores in statistics, physics 
and mathematics, respectively). It is conjectured that the correlation 
between x, and x, is due to the influence of x, on both x, and xy 
Test whether this conjecture seems valid in the light of the above 
data. i 

The conjecture may be said to be valid if the population partial 
correlation coefficient of x, and Xa eliminating the effect of x, from 
both, is zero. Hence we have to test the hypothesis Hy : pyy.;—0 
against the alternative H : Pars #0. 

Under the assumptions of normality of Xis žo» X, and of random- 
ness and independence of the sample observations, the statistic to 
be used for making the test is 

patina n=3 
Vi=r3,5 


with n—3=57 degrees of freedom. 


SOME FURTHER EXACT PROCEDURES 521 


Here 
nema de fishes 0:0250 
"i Vieri,V1—ri, V04875V 05276 
=0-0661 . 
Hence à 
È 0-0661 1/57 
v0:9956 

0:4990 

= 09978 =0 500. 


On comparing this with the tabulated values 


t 


Lors, = 2-003 
and “t-00, = 2657, 


we find that H, is to be accepted. In other words, the conjecture 
that the correlation between x, and x, is due to the effect of x, upon 
them seems to be borne out by the data. 


18.5 Problems relating to a multiple regression equation 
RARE Fis Kes west »*» be variables such that the true regression 
equation of x, on xg, xX) <... » xp is linear, say 


x= bit Bata tpartat ore +ByXy- ve (18.17) 


Here x, is, of course, a random variable while x, EMME tp 
may be cither random or non-random. In either case, it is 
assumed that the distribution of x, for given x, x5) ...... »Xp is 
normal with mean given by the right-hand side of (18.17) and a 
constant variance, say 0, (In case xp, Xy, s... » *» are random, this 
distribution is a conditional distribution and the stated condition 
will hold if, in particular, the joint distribution of x}, xp, .....- ake 
is multivariate normal.) 

As in the case of simple regression, let us first suppose that 


Hasta Senor »%p are non-random. The fixed values of x», x5, «....., x, 
may be denoted by xq, +--+ »%pa (=I, 2, ......,) and let x,, 
(a=1, 2, Hii. »n), be respectively, independent random samples of 


size one each from the corresponding distributions of *, It will be 


§22 BUNDAMENTALS OF STATISTIOS 


assumed that the matrix 


Sas Spy or Sap 
S=[ Sao Srat Say J, ws (18,18) 
See TENPE 
where Sij=Z (tia) (tja —¥4) 
and i= Ekia 


is positive definite and that n> p. 
The estimated (least-square) regression equation will be 


PE E E AALE T O ves (18.19) 
where bp, bas «+++ , bp are the solutions, of the equations 
Sor =b Sn HbS szeres +5 So» 
Su =baSau+ basast uous + b,S3p 4 ve (18.20) 
Spi—beS yet TA A Sexe’ +b Sp» J 
and ‘ by =F —bg%q—dgFy—-- +. — bpp 


In order to determine the joint sampling distribution of 
bes bas +++.+05 By, let us write 


Roy Xoia Kgg mgee Xp fp 
x=] *12 |, A= i. Ani aM mee ee i; 
Py EE aa ee hea i 
‘ (18.21a, b) 
so that S=A’A, w+ (18,21) 
bg 
ant b=f bs |. 5 ess (18.21d) 
by 
Since : Sa=Z(*ie—¥i) (%1a—#1) 


=Z (sia R) 
the normal equations (18.19) may then be put in the form 
A'x,=Sb. wee (18.22) 


SOME FURTHER EXACT PROCEDURES 523 


As such, we have 

’ b=S7 A’x, ve» (18.23) 
which expresses the estimated regression coefficients as linear 
functions of the random variables x,,(a=1, 2, ...... ,n) which are 
normally and independently distributed with means given by (18,17) 
and a common variance, o*. It is then obvious that bp, bg, =». s by 
are distributed in the multivariate normal form. They have the 

mean vector 

E(b) =! E[S“1A’(x,—%,1)], since A’1=0 

=S-1A’ E(x,—%,1)=S7A‘AB=p. (18.24) 


Bs 
where P= B 3 | and the dispersion (or variance-covariance) 
aaa 
matrix ‘ 
var(b) =E(b—8)(b—B)’ 
=S-1A‘o°IAS" q 
=oiS-1, ase (18.25) 
since b—8=S~-'A'(x,—Af), 
implying that (b—B)(b—B)'=S-1A' (x, —AB) (x; —AB)'AS™ 
ins E(b—B) (b—f)'=S~1A’E(x,—AB)(x,—Ap)'AS™, 
while E(x,— AB) (x,—AB)' =variance-covariance matrix 
Of Xirs Xaas eierens Xin 
=l. 

Hence var (b) =S" ws (18.254) 
and cov(bn bj) =S, ws (18.25b) 
where S” is the (i, j)th element of the reciprocal matrix of S, 
viz. S-}, ` 


The above results also imply that b, which is a linear function of, 
Ry bn bp eee , bp, is normally distributed, with mean 


E(b,) =E) —E (by) Eh) 3- —Elb, 8p 
= (p+ E pR) -BBa 


=f; (18.26) 


524 FUNDAMENTALS OF STATISTICS 


and variance 


var(b,) =var(z,)+%’D(b)X 


=o1['+3'S], ve (18.27) 
n 
ži 
where 3 x= Xs 
| i, 
Again, for any given 
Xe 
i= * , 
žy 
the estimated value (X,) of the mean of x,, being a linear function 
OED ig: bi baiiia. by, is normally distributed with 
E(X) =E(b) + $ Elbi) 
; 6,4 $ Bix; : 
=f, ... (18.28) 
and var(X,)= varl +2b(—a)} 
. ieg 
=var(3,)+var[ Žim] 
iag 
+ f 
: = A + ZZ) (xj—8)) S08 
=o'[+(-8)'S-1(x—8)]. ve (18,29) 


To indicate how tests of various hypotheses are to be performed 
and confidence limits are to be set to various parameters, let us 
assume that the common variance of 810 (aml, 2,......,.") is 


unknown as well as the f’s, In the above formulae for variances, 
a? will then be replaced by its unbiassed estimator 


SP 93..9= Ti. ~hi bam — b,x pa]’/(n—p) 
i = (Si -$ hs S) = a=pln yi Fh; Sal- 
(18.30) 


SOME FURTHER EXACT PROCEDURES 525 


To test the hypothesis H, : 8,=,°, the pipaa test statistic 
will be 


“Boa ofS (18,31) 
which is, under the eared hypothesis, distributed as a ¢ with 
df=n—p. 

More important perhaps will be tests concerning the regression 
coefficients f,(i=2, 3, .:.... » 6). For testing the hypothesis 
H : B;=83, the proper test statistic will be 

(6,—B2)/$4.93..9VS4, ve (18.32) 


which is, under the stated hypothesis, distributed as a £ with 
df=n—p. But one may as well be interested in a hypothesis 
concerning the difference between two regression coefficients, say 
B; and Bij). The appropriate test statistic for testing the 
hypothesis H, : 8; —B;=8, will be 

(bi—bi— 89) [541.9 3... VST — ISTF S7. vs (18:33) 
This again is, under the stated null hypothesis, distributed asat 
with df=n—p. 

We may also be required to test the hypothesis that the mean of 
the distribution of x, corresponding to a specified x has a specified 
value, say the hypothesis Hg : £.=°. The appropriate test statistic 
will now be 


(HEN sane HESS (18,94) 
which too is, undies the null hypothesis, distributed as a { with 


df=n—p. ` 
It should also be apparent from the above discussion that the 


observed values of , 
+ l Q~; 
bi Fhaje, n-9. 51.3 arate Sx vs. (18,35) 
will serve as confidence limits to £, with confidence coefficient 1—a. 
Similarly, the observed values of 


by Flare, n-p $4.93..p VS! s+ (18.36) 
will be confidence limits to 8, with confidence coefficient 1—a. 


526 FUNDAMENTALS OF STATISTICS 


But we may be required to set confidence limits to £x, the mean 
of x, corresponding to a specified x. These limits, with confidence 
coefficient 1 —a, will be the observed values of 


Xi Flers, n-p Senor at (2—3)'S71 (x-3). w+ (18.37) 


There, is finally, the problem of setting prediction limits to the 
value of x, for a randomly chosen individual for which the value 
x isknown. It is assumed that this x, is independent of the observa- 
tions (x4) ¥gas ++++++) Xpa)» Noting that x, — X; is normally distributed, 
with mean 

E(x,—X,)=e—Ga=0 


and variance 
var (x,—X,)=var (x,) + var (X)) 


1 ESEN Ey 
=o [145+ -3's (x—¥)], 
we have the statistic 
GXi) 141+ (R—B)'S4(x—2) 


distributed as a ! with df=n—p. Consequently, the observed 
values of 


Xi Fiais, n-p $ 1°28) i+i+ (x—X)’S“(x—X) os (18.38) 


will be the desired prediction limits, the associated confidence 
coefficient being 1—a. 

- It should now be noted that the procedures outlined above 
remain valid even when the ‘independent variables’ xg, x,, +... ap 
are themselves random. The ¢-distributicns obtained above may 
then be regarded as the conditional distributions of the respective 


statistics for given xgay Xaa eee spa (=l, 2, nona n). But since 
a t-distribution has the df as its only parameter, which is in these 
cases a constant (independent Of xg, Xgay +: spa), these will 


also be the unconditional distributions. Consequently, for instance, 
(6, — Bi) 1842909 VS 
will be distributed as at with df=n—p under Ho : B;=8,° in the 


SOME FURTHER EXAOT PROOBDURES 527 


present situation also, Hence this will still be used for testing H, : 
Bi=8;°. 

It should also be noted that the above discussion applies to the 
case when there is a single ‘independent’ variable, say x, but the. 
true regression of x, on x is a polynomial of degree p—1. For we 
may just put 


xm (fe 2)'3) 2, D) vas (18.39) 


in the various expressions. 


18.6 The problem of multicollinearity 

From the nature of the normal equations (18.19) it will be 
apparent that the regression coefficients f; (i=2, 3, ...-+. , ġ) cannot 
be (uniquely) estimated in case the matrix S is singular, i.e. is of 
rank less than p—1. But this is the case where the ‘independent’ (or 
explanatory) variables xq, xg) +.. x» are exactly linearly related. 
Thus in case the determinant S=|S| is zero, or in other words, 
the explanatory variables are exactly linearly related, it is impossible 
to estimate the regression coefficients. 

Next, let us consider the situation where $0 (i.e: S>0) but is 


very nearly zero. We have seen already that 
| 


var (bj) =S"? (i=2, 3, -> $). | 
Now, S" equals adj S,;/S, so that a small value of $ will make the 
variance (or the standard error) of b; large. In other words, the 
variances of the estimated regression coefficients will become large 
' if the independent variables are closely related to each other. . 
Both the above situations come under the problem of multi- 
collinearity. Thus owing to multicollinearity, it may become 
impossible to estimate the coefficients in a regression equation. 


Alternatively, it may be possible to estimate the coefficients but, 


owing to multicollinearity, the estimators will be imprecise in the 


sense that they will have high standard errors. 
This points to the need for exercising caution in the choice of 


the explanatory variables. The explanatory variables must not be 
such that they are likely to be exactly or very nearly related to 


each other. 


528 FUNDAMENTALS OF STATISTIOS 


Example 18.4 Consider the data of Example 13.1. For these 
data, 
%,= 32.278, %,=9:9444, %,=3-6667, 
-while Sy1:= 3539-61, : Si; =858-28,  $,,=256:67, 
Soo=352-94,  Sy3=58°667, Sy, = 36-000. 
Assuming that x,, x, and x, are jointly normally-distributed in 
the population and that the given set of observations constitutes a 
random sample of size n=18 from the trivariate normal distribution, 
we may proceed to test various hypotheses regarding the population 
regression equation of x, on x, and x5, say the equation 
x= fit Beret Pata 
In the present case, we have 
Pay iE 9-94 
a a )=( 367 
4 667 
Si s= (se su ) = ( 5es67 36-000 
so that 
|S] =SgoS35—S2,=12705 84—3441-82—9264-02, 
S23 = 36-000/9264-02 = 0-003886, 
S= — 58-667 /9264-02— — 0-006333 
and S%=352-94/9264-02=0:038098. 
Hence 
l by ) pe ap (§ n j= 0003886 —0-006333 ) ( 858-28 
by Ss —0:006333 0-038098 } | 256-67 
=| 3-33528 —1-62549 )=( 1:7098 
—5 43549 +49-77861 4:3431 
and by =F — byt — baig 
=32:278 — 1:7098 x 9:9444 — 4:3431 x 3:6667 
=32:278— 17:0029 — 15-9248 
= —0-6497, 
Also, 
Sia a= (Sr basie — basa) (n—2) 
= (3539-61 —1-7098 x 858-28 —4:3431 x256:67)/16 
==957-379/16— 59-8362 
so that s1. 4377354. 


SOME FURTHER EXACT PROORDURES 529 


Suppose we are to test whether f, is significantly greater than 
zero, ‘We are then to compute 
by 
3 siga vs 
This has, for the data, the value 
1:7098) > ag cue) cea gos 
7°73544/0-003886 7°7354 x 0:06234 7 
On comparing this value with tos1=1:746 and t-o),:¢==2°583, 
we find that H, : 8,=0 is to be rejected. In other words, the data 
indicate that £, is significantly greater than zero. 
Suppose also that we are to set 95% confidence limits to the 
mean yield of dry bark for a plant that has height 20 inches and 
girth 5 inches (at a height of 6’). We have here 


20\ =. / 9:9444 \ 
went x=( 5  3=( 3.6667 


d : s-1=( 0-003886 —-0-006333 
an =| —0-006333 — 0-038098 }* 


Hence 
+ T =- = 
siap; + 0—3) S~!(x—x) 
=7-7354{ J+ {(10-0556)* x 0:003886-+2 x 10-0556 x 1-3333 x 


x (~0-006333) +. (1:3333)? x 0-038098)) " 


=7:7354V 0°055556 +-{0'392933—0°169815 -+-0-067726} 
=7:7354-/ 0:346400 =4-5527. 
Since, for the given x, 
X,=b, + bata bg%y= —0°6497 + 1-7098 x 20 
4-4-3431 x 555-2618, 
the 95% confidence limits are 
X,—teons,1e X 45527 =55-2618 — 1:746 x 45527 
=55°2618—7:9490 = 47-313 oz. 
and  Xi+bomieX 45527 =55:2618+7:9490=63:211 oz. 


r (1)—%4 


530 FUNDAMENTALS OF STATISTIOS 


18.7 Combination of tests 

When several tests of the same hypothesis H, are made on the 
basis of independent sets of daca, it is quite likely that some of the 
tests will dictate rejection of the hypothesis (at the chosen level of 
significance) while the others will dictate its acceptance. In such a 
case, one would naturally like to have a means of combining the 
results of the individual tests to reach a firm, over-all decision. 
While one may well apply the same test to the combined set of 
data, what we are envisaging is a situation where only the values of 
the test statistics used are available. 

Let us denote by T; the statistic used in making the ith test 
(say, for i=l, 2,......, k). Commonly, Ti, Ty, ..... , Tą will be 
statistics defined in the same way (like x? statistics or £ statistics), 
but with varying sampling distributions simply because they are 
based on varying sample sizes. To fix ideas, let us assume that in 
each case the test requires that H, be rejected if, and only if, the 
observed value of the corresponding statistic be too large. Consider, 
in this situation, the probabilities 

N=P[T;>t|Hy), for i=1, 2, ......, k. «. (18.40) 

Provided 7; has a continuous distribution under H,, say with 

p.d.f. g;(t), so that 


© 
J= jy g(t)dt, vs (18.40a) 
fi 
where f; is a randomly taken value of T, y; has the rectangular 
distribution over the interval [0, 1] under H, and hence —2 In y; 
has the x? distribution with df=2, Consequently, 


k 
P,=—2% Iny, ws (18,41): 


fhas, under H,, the x* distribution with 24 degrees of freedom. This 
statistic is used as the test statistic for making the combined test. 
One would reject H, if, and only if, the observed value of Py 
exceeded XE des 

The case where cach individual test requires rejection of H, if, 
and only if, the observed value of the corresponding test statistic is 
too small, or the case where each individual test requires rejection of 


SOME FURTHER EXACT PROCEDURES 531 


Hp if, and only if, the observed value of the test statistic is either too 
large or too small, is to be similarly dealt with. The reason is that, 
if P, have continuous distributions under H,, then 


y= PT; < h| H) ws (18.42) 
and v= PET > |i] H] ... (18.43) 


are also rectangularly distributed over (0, 1]. This implies that the 
P, statistics appropriate to these situations, viz. 


P225 nui ve (18.44) 
ist 
k 

and wine’ Tong Vir n (18,45) 
=1 


are also distributed as x® statistics with df=2k under H,. In each 
of these cases also, the over-all decision will be to reject H; if, and 
only if, the observed value of the respective P, exceeds x3, ¢- 

Example 18.5 In order to test whether the mean height (x) ofa 
variety of paddy plants, when fully grown, is 60 cm. or less than 
60 cm., five experimenters made independent (Student’s) t-tests 
with their respective data. The probabilities of the ¢-statistics (with 
the appropriate df in each case) to be less than their respective 
observed values are 0-023, 0-061, 0:017, 0-105 and 0-007. Ifthe 
tests are made at the 5% level, then the hypothesis H, : p=60 cm. 
has to be accepted in three cases out of the five.’ 

In order to combine the results of the 5 tests, we note-that log y;, 
for i=], 2, 3,4 and 5, are 2:36173, 2:78533, 2-23045, 102119 and 
3-84510, respectively. 

Hence for the data, 5 log t= —10-+2-24380=— 7-75620, so that 

=i 


6 
P= 7324p tl 
= (In 10)(—2 Slog u) 
i=1 


' =9:30259 x 15:5124= 35:719. 
This is to be compared with X%oq,9=18°307 and xo =23'209. 


532° FUNDAMENTALS OF STATISTIOS 


Since the observed value of P, exceeds the tabulated values, the 
combined results of the experimenters’ tests leads to the ejection of 
H, at both the 5% and the 1% level. 

In other words, in the light of all 5 experimenter’s data, we may 
conclude that the mean: height of the variety of palay plants is less 
than 60 cm. 


18.8 Tests for outliers i 

A common problem that every practical statistician has to face 
at some time or other in the course of his work is to decide whether 
one or more of the observations available to him come from a 
distribution different from the distribution yielding the other obser- 
vations. The problem arises in the context of suspected observational 
or recording errors. The enquirer scrutinises the data and gets the 
impression that some of the observations are too high or too low to 
' be compatible with the assumption that they have all been obtained 
from the same distribution, What we require is some objective 
method of deciding whether the enquirer’s suspicion is borne out 
by the data. 

It should be noted that the problem considered here is distinct 
from the ordinary problem of comparing (the means or standard 
deviations of) two samples. For, in the'present ‘situation, one does 
not know in advance which of the set of data may have come from 
the discrepant distribution. Had this been known, two-sample j 
techniques discussed in the earlier chapter could very well be-used 
here also. 

' Let us confine ourselves to the case of univariate data. Since 
the suspicion is aroused by the nature of the tails of the observed 
distribution, the test statistics that suggest themselves are the smallest 
and largest order statistics, x,,, and xn}, or, rather, their deviations 
from some measure of location for the unsuspected observations, 
In case both high and low values are in suspicion, the sample range 
relative to the actual or estimated s.d. of the populations would 
seem to be a suitable test criterion. 

We shall assume that the data purport to be a random sample of 
size n from a normal distribution, with some standard deviation øo. 


SOME FURTHER EXACT PROCEDURES 533 


When. ø is known, the test will be based on the, standardised 
extreme deviate 
4, =(%—x,,))/o «= (18.46) 
or ; Un = (Xin) — F) 0 s.. (18.47) 
as the case may be. ‘Here z is the mean of the combined data. 
The outlying observation x,,; would be rejected as anomalous when 
` the observed ‘value of u, is too large (i.e. exceeds the tabulated 
value). Similarly, the outlier x,,) will be rejected as anomalous 
when -the observed value of u, is,too large. The percentage 
points of u, (which are the same as those of u,) are tabulated in 
Biometrika Tables, Vol. I (Table 25). In case both x,,, and x.) are 
under. suspicion, one will reject them in case the value of the 
standardised range . 
w= (%m)—%))/2 ... (18.48) - 


is too large. Percentage points of w are given in Table 22 of 
Biometrika Tables, Vol. 1. 

When ois unxnown, we may consider a situation where an 
unbiassed estimate (s’*) of o?'s ayailable from an independent 
sample known to be drawn from the same distribution. The 
counterparts of u, u, and w that are to be used in this situa- 
; m=(3— 3n)" ahs we (18.49) 
Ve = (žin — 3). see (18.50) 
and iiy r 
- q= (4 ny —% 4)! (18.51) 
Percentage points of v, and v, are available in Table 26 of 
Biometrika Tables, Vol. I, as well as in Table 9.4 of Formule and 
Tables for Statistical Work by Rao, Mitra, Mathai and Ramamurthi. 
Percentage points of g, called the Studentised range, are given in 
Table 29 of Biometrika Tables, Vol. I, for varying n and » (the 
df of s) 

[In an ANOVA, for TAAA the. n values `x, x9 +--+ ee 
would be n sample means being compared, while s’? would be 
MSW/E, supposing the samples are of the same size k.] 


534 FUNDAMENTALS OF STATISTICS 


Example 18.6 For a sample of 9 light bulbs coming out of a 
bulb manufacturing factory, the figures for life (in hours) are given 
as follows : 

763. 829 | 799 1129. 1857 
607. 982 805 895 


It is known that the standard deviation of life for bulbs oS ee 
by the factory is 128 hours. 

On going through the data, it immediately appears to us 
that the value 1857 is out of line with the other values in the set. 
It would be proper then to test this value as an outlier. 

We make the assumptions that the values purport to be a random 
sample from the population of bulbs manufactured by the factory 
and that the population distribution of life (in hours) for such 
bulbs is normal with o=128. 


Here n=9, X(a)== 1857, 
while i= 2 2/9 =963. 
=i 


As such, the test statistic u, = (x, „}—¥)/o has the value 
(1857 ~963)/128=6-984. 


This i is to be compared with the upper 5% and upper 1% points of 
the distribution of u,, viz. 2:39 and 2:88. Since the observed value 
is significant at both the 5% and 1% levels, the outlier 1857 is to 
be rejected as anomalous. 

Example 18.7 The mean yield of dry bark (in oz.) per plant for 
each of 10 groups of cinchona plants, there being 5 plants in each 
group, is given as follows: 40,53, 41, 38, 23, 58, 46, 39, 47, 50. 
It is known that the plants are all of the same variety and reared 
under similar conditions, An estimate of the standard deviation of 
yield of dry bark (in oz.) per plant is available, viz. the within 
group mean-square, which is 342. The question is whether the 
value 23 is consistent with the other 9 values in the series. 

Under the usual assumptions of normality of the population, 
randomness and independence of the samples, the appropriate test 
Statistic will be 


m= (%—%,,)/5’. 


SOME FURTHER EXACT PROOEDURES 535 


For the data, 
cay 228, BE (HOESS- ess + 50)/10 425 
and s =v 342/5=8-27, 
so that v, has the value 
(42-5—23) (8:27 =2 36. _ 
Also, here n=10 and v=40. 


This value of v, is to be compared with the upper 5% and 1% 
points of the distribution of v, vize 2°55 and 3-13, Since the 
observed value of v, is smaller than both the tabulated values, we 
find no reason to consider the observation 23 to be anomalous. 


18.9. The normality assumption 

The tests and confidence intervals for different parameters 
discussed in this and the earlier chapter have been generally 
derived under the assumption that the underlying distributions are 
of the normal type. By assuming that the underlying distributions 
are of a different type, naturally a different set of tests and confidence 
limits would be obtained. 

The results are, therefore, strictly valid in sampling from 
normally distributed populations. Some work has been done to 
have an idea as to whether they hold for other types of distribution 
as well, It has been found that provided the population distribution 
diverges only slightly from normality, the results given remain valid 
to a large extent. However, if in any case the population distribu- 
tion departs markedly from normality, the above methods should 
not be used. Some methods for dealing with such situations will be 
discussed in Chapter 20. Some approximate procedures that often 
allow us to avoid the normality assumption will be considered in 


Chapter 19. 


18.10 Tests for normality 
Most of the hypotheses and confidence intervals that we have 


considered so far (in this and the earlier chapter) are of the 
parametric type. In each case, the form of the population distri- 
bution or distributions has been assumed to be known (usually, to 
be cither univariate or multivariate normal) and the inference 


536 FUNDAMENTALS OF STATISTIOS 


problem has been concerned with the value (or values) of some 
unknown parameter (or parameters). But an experimenter may well 
like to judge whether the data warrant this assumption regarding 
the form of the distribution. 
We shall confine ourselves to the case of univariate data. . Thus 
given that x,, x», ...... ; X, are obtained as a random sample from a 
_ univariate distribution, our problem is to judge the hypothesis that 
the population distribution of the variable (x) is of the normal type, 
i.e. has p.d.f. 


f) =z Ti sag [mzee] 


with some (unspecified) mean p and some variance o°. 
Two methods are in common use for dealing with the problem : 

(a) One may fit a normal distribution to the data (grouped 

“into a frequency table) and then apply the x*-test of goodness of fit 
. (as described in Chapter 19). 

(b) One may calculate certain coefficients baal on the sample 
moments and then -test for the significance of their departure from 
the values expected for a normal distribution. 

The first procedure is in some ways advantageous, but carmnot be 
applied when the total frequency is small. Besides, the x®-test for 
goodness of fit is rather insensitive to real departure from normality, 
which is due to the need for grouping together small tail frequencies, 
while applying the test. ‘ 

In the present discussion, we shall be concerned only with 
procedure (b). A procedure requiring use of distribution functions, 
suggested by Kolmogorov and Smirnov, will be .describéd in 
Chapter 20. 

Departure of 

V by =m,/m,3!2 w+ (18°52) 
from the ‘normal’ value 0 is an indication of skewness of the 
bepaletion distribution, while departure of 

bg=m,/m,* - - (18,53) 
from the ‘normal’ value 3 is an indication of lepto- or platy-kurtosis. 

When the sample size is large, rough tests of normality may be 
made by comparing 4/6, and 5,—3 with their approximate standard 


SOME FURTHER EXAOT PROCEDURES ; 537 


errors, viz. vôn and Vin: For the case of moderately large 
samples (n > 25 for /b, and n> 200 for by), one may use Tables 34B 
and 34C of Biometrika Tables, Vol 1. These give the upper and lower 
5% and 1% points of the distribution of \/6,, and those of the distri- 
bution of by, (These points have been computed by assuming that the 
sampling distributions of V5, b, and b, can be represented by Pear- 
sonian curves having the same first four moments.) ` 

An alternative statistic has been suggested by Geary for aeni 
deviations from mesokurtosis. ` This will be particularly appropriate ` 
for samples containing less than 200 observations. (For such 
samples, the method employed in preparing Table 34C is not valid 
owing to the asymmetry of the sampling distribution of b). Geary’s 
statistic’ (called Geary’s ratio) is : 

ee sample mean deviation | 
sample standard deviation eM 
Bled ltnZCer—8))"" an (1854) 


(For a normal distribution, the ratio (mean deviation)/(standard 
deviation) has the value V/2/7=-0:7979 ; it is higher for platykurtic 
and lower for leptokurtic distributions.] 

The distribution of Geary’s ratio (for.samples from a normal 
parent) tends to normality fairly rapidly and its expectation tends _ 
to the ‘normal’ value 0-7979. 

Table 34A of Biometrika Tables, Vol. I, gives upper and lower 10, 
5 and 1% points for a. 

Example 18.8 For the. data of Example 9.3, .we have (from 
Example 9.5) /b,=0°5581 and (from Example 9.7) b,=3-4606. z 

From Tables 34B and 34C of Biometrika Tables, Vol. 1, the upper 
and lower percentage points of the V$, distribution and the bẹ distri- 
bution (for a normal parent) for n=1,175 are obumed as follows : 


. For Vb; : 


0-167 
—0°167 


538 FUNDAMENTALS OF STATISTIOS 


and for bg: 


ee 


5% 1% 
eae ie a oo Shit 
Upper | $24 3:37 
Lower | 278 2.71 


m_m 


It would thus appear that the population distribution is markedly 
different from normal, being positively skew and leptokurtic. 


Example 18.9 For the data of Table 6.1, we have, from Examples 
7.2 and 7.3, 


mean deviation (about mean) =86:95 gm, 
and À y 
standard deviation =97:35 gm. 


Hence for these data, 
a=86:95/97:35=0:893. 
From Table 34A of Biometrika Tables, Vol. I, for n=12 (the 
sample size for our data) the percentage points of a may be taken 
to be as follows : 


5% 31% 


Upper | 0908 0:932 
Lower: 0-217 0-665 


Since the observed value of a falls between the lower and upper 
5% points, the hypothesis of mesokurtosis, H,:8,—3, is to be 
accepted at the 10% level. 


Questions and exercises , 


18.1 Describe the method you would use to test for the equality 
of means of &(>2) univariate distributions, stating clearly the 
assumptions you would make. 

18.2 Indicate, under appropriate assumptions (to be stated 
clearly), how one can test for the equality of variances of k(> 2) 
univariate distributions. (Give the likelihood-ratio test and also 
the modifications suggested by Bartlett.) 


SOME FURTHER EXACT PROOUDURES 539 


18.3 For data on p variables, say xy xp o , Xp, where x, is 
stochastic but x», x5, ------ , xp are non-stochastic, if x; has expectation 
(depending on xp, xg, ...... 9%») 3 

E(x;)=Bit+Bexet- os +Bp Xp) 
and variance independent of x, x3, .....- ,%p, let the least-square 
regression equation of x, ON Xa, Xg, esen iXp, be 


yb, + batte +by Xp. 

Obtain, under the usual assumptions (to be stated), the joint 
distribution of ba bg .....- ,b,. Hence give E({b;), var{bj) and, 
for ij, cov(b;, b;). Also, give E(b,) and var(b,). 

18.4 (Continuation) Suggest tests for the hypotheses Ho; : Be=,", 
Hoe : Bx—Bs= 5p and Ho, : E(x;)=n° for x;=x;? (i=2, 3, e P). 

How are these tests to be modified in case xg, p +--+: , x, too are 
stochastic ? 

18.5 (Continuation). Set confidence limits to f,, 8, and E(x,) for 
x,=x,9, and also prediction limits to x, for x; =x;° (i=2, 3, ....-- Wae 

What modifications are needed in case Xp %3, =- s *p are 
stochastic ? J 

18.6 (Continuation) How would you test for the hypothesis H, : 
£,=f,°? Consider the case where xq, Xa) -+-+++ , x, are non-stochastic ` 
as well as the case where they are stochastic. ; 

18.7 Describe the P,-test. What purpose does it serve ? 

18.8 What are outliers? Describe some tests for outliers, clearly 
stating the underlying assumptions. 

18.9 How would you test whether the distribution from which 
a given set of univariate observations has been obtained asa random 
sample is a normal distributioa ? 

18.10 For the set-up of Exercise 18 3, show that 


Eltra — bi — batsa — essas —bp pa)? 


=2na — by riab A ieta — seere ~b, Etiopa 
=Sn— $ bisu 
i=? 


and also =Sn-È $4050 


540 FUNDAMENTALS OF STATISTIOS 


` 18.11 For two sets of data on p variables xy, xa, Ris pasin 
' Exercise 18.3, describe how you would make’ comparir between the: 
| regression equations. 

18.12 (Data of Steel and Torrie) For 30 samples of pacco 
taken from farmers’ fields, the percentage of nitrogen (x,), percen- 

tage of chlorine (x), percentage of potassium (x,) and the log of leaf 
burn in sec. (x,) were recorded. These gave 2 

Drie = 20-58, Leng = 98-96, Zsa = 24-23, Fraa = 1396, 
Z3, =20:8074, SY xp.%20= 616502, Sx ar%gq= 124103, Yxarqa= 
98:4408, Sx}, — 332-3352, Yxyarja = 815834, Dxgatye= 4594052, 

-Zaa = 301907, Sxgq%q0=120-3950,. $x}, = 682°7813. 

(a) Fit to, the data a least-square linear regression equation of 
x, ON Xp, x, and x, Hence test whether the partial regression 
coefficients of x, on x, and on x, are significantly different. 

(b) Set 95% confidence limits to > the true partial iparessiop 
coefficient of x, on xy 

(c) For x,=3-5, x=1:0 and a4 2, set 95% confidence limits 
to the population mean of x,. 

Partial ans. (a) x,=0°55934—0- 12703x, —0: 51755x; 
; +0:20654x, ; t=3-082. 
(b) 0:0686 and 0:3445. 
2 (c) 0:2951 and 0:6342. 

18.13 For the data of Example 13.1, the multiple correlation of 
weight of dry bark (x,) on height (x,) and girth at a height of 6” (x3) 
was found to be 0-854. Test whether this value may be supposed 
to have arisen in sampling from a distribution where the multiple 

correlation is zero, Partial ans. F=20:22 with df= (2, 15). 
, 18.14 The IJ. Q.’s for random samples of 15-year-old students of 
4 schools in a city are given below : ‘ 


School A School B School C School D 


91-2 109-1 79-2 83-3 
853 97:2 95-1 95:7 
87:7. ` 88-6 97-0 107-8 
102-4 102-7 88-6 102:3 
95:5 89-1 105:5 
111-3 1126 


96:2 


SOME FURTHER EXAOT PROCEDURES __ 541 


Test whether the mean J. Q, differs significantly from school to 
sċhool. Partial ans, F=l' 800 with df=(3, 18) 


18.15 -For plants of six varieties of tomato grown’ under similar 
conditions the yield per plant was recorded (in gm.). The number | 
of plants and the variance of pcb per plant are Ppa below for 
each variety : f 


(Unbiassed 


estimate of) 
variance 5 
1 29 10053 - 
2 ` 35 -9720 i 
3 18 8007 
4 28 11274 
5 40 | 9908 
6 52 7881 


l Examine whether the variances differ significantly. 
Partial ans. M=0°495. 


18.16 The following figures are the sitting heights (in cm.) of 20 
adults of an Indian tribe : 
84:33 86-12 103-21 86:11 


85:01 85°91 89-12 85:73 
84-35 84:23 85:29 86:17 
86-21 85-71 84:77 85:21 
85:72 82:25 85:21 84-95 


Test whether the figure 103-21 is consistent with the other | 
figures (assuming that the population s.d. is 3.21 cm.). 
Partial ans. t,=5°274. 
18.17 In order to test whether the proportion of people with 
pulmonary complaints was the same among smokers as among non- 
smokers or was greater, 5 experimenters used, on their separate sets 
of data, the large-sample normal deviate test for two proportions as 
described in Chapter 19. The values of 7 obtained by them were 
3-58, 1-25, —0-04. 3-23 and 2°14. Noting that the hypothesis is to 


542 FUNDAMENTALS OF STATISTIOS 


` be rejected in any of these cases if, and only if, the corresponding 7 
value is too large, combine the results of the 5 experimenters to 
judge whether the proportions are equal in the population. 
Partial ans. P,=43'655 (df= 10). 
18.18 The yield (in gm.) for each of 15 soyabean plants of a 
particular variety is obtained as follows : 


38 15 9 
67 57 42 
12 46 29 
29 34 31 
4 16 25 20 


Test whether the variable may be assumed to be normally 
distributed in the population. Partial ans. V/by=0'627, a=0°797. 


SUGGESTED READING 


[1] Barnett, V. and Lewis, T. Outliers in Statistics (Chs. 1-3). John 
Wiley, 1978. 

[2] Dixon, W. J. and Massey, F.J. Introduction lo Statistical Analysis 
(Chs. 10, 11, 16). McGraw-Hill, 1969, and Kogakusha. 

[3] Goulden, C. H. Methods of Statistical Analysis (Chs, 5—8). Asia 
Publishing House, 1959. 

[4] Hald, A. Statistical Theory with Engineering Applications (Chs. 12, 
16, 19). John Wiley, 1952. 

[5] Johnson, N.L. and Leone, F.C. Statistics and Experimental Design, 
Vol. I (Chs. 8, 12). John Wiley, 1964. 

[6] Keeping, E. S. Introduction to Statistical Inference (Chs. 9, 12, 13). 
Van Nostrand, 1962, and Affiliated East.West Press. 

{7] Mood, A. M., Graybill, F. A. and Boes, D. C. Introduction to 
the Theory of Statistics (Ch. 10). McGraw-Hill, 1974, and 
Kégakusha. 

[8] Pearson, E. S. and Hartley, H. O.. Biometrika Tables, Vol. I 
(Introduction : Sections 5⁄4, 92, 13, 14, 16). Cambridge 
University Press, 1958. 

[9] Rao, C. R. Advanced Statistical Methods for Biometric Research 
(Chs. 3, 6). John Wiley, 1952. 


1 9 APPROXIMATE TESTS AND 
CONFIDENCE INTERVALS 


19.1 Introduction 

The results obtained in the three preceding chapters are exact 
in the sense that the probability connected with any test of signi- 
ficance or any confidence interval is exact, provided, of course, the 
underlying assumption regarding the form of the distribution is 
in each case satisfied. These results are valid irrespective of the 
sample size. 

In the present chapter, on the other hand, we shall consider 
some approximate results which are valid only for sufficiently large 
samples. This is, no doubt, a limitation ; but otherwise these have 
wider applicability because they hold for all distributions which 
satisfy certain general conditions, rather than being valid for some 
particular type of distribution (like the normal). Furthermore, the 
results are comparatively easy to apply in actual hypothesis-testing 
or in setting confidence limits to a parameter, requiring, as a rule, 
the use of the distribution of the normal deviate only. These 
approximate results are based on the following facts : 

In the first place, it has been found that the random sampling 
distributions of many statistics tend, for large samples, to a form 
either exactly or very nearly normal, except for some rare types of 
population distribution. 

Secondly, for large in, the expectation of a statistic will generally 
be approximately equal to the corresponding parameter, while its 
variance will be approximately of the form 5*/n, ô being a finite 
quantity. Hence the larger the sample size, the more concentrated 
will be the distribution about the parameter and the smaller will be, 
on the average, the deviations between the values of the statistic and 
the corresponding parameter. It follows that any unknown para- 
meter can, in general, be well estimated by the value of the statistic 
found from a given sample, provided the sample size is large 
enough. 

543 


544 : FUNDAMENTALS OF STATISTICS 


Lastly, the characteristics of the sampling distribution of a 
statistic, like its mean, moments, sten may also be estimated from 
the given sample. j 

Take, for instance, the sample mean z for a random sample from 
the probability distribution of the variable x. It can be shown that 
if the population variance o? is finite, then the sampling distribution 
of z tends to normality for large n—this is the so-called Central Limit 
Theorem. Again, 7 has expectation p, the population mean of x, and 
variance ofn, so that in case o? is finite, the sample mean will serve 
as a good estimate of the population mean, provided the sample size 
.is sufficiently large. Consider next the other characteristics (besides 
the mean) of the sampling distribution of . The standard error 
o/V'n may be taken for illustration. This will be well estimated by 
sl Vn, s being the sample standard deviation. 

For a statistic 7’ of this type (7’—@)/o, where @ is the corres- 
ponding parameter and og is the standard error of T, is, therefore, 
` approximately a normal deviate. If one has to test a hypothesis 
regarding 0, say, 5 
; Hy: 0=0, 

‘one will utilise the fact that under H, 
(2-6) /og (49.1) 


is approximately a normal deviate. If op be known, either á priori or 
from the hypothesis, one would, therefore, compute 7 from the 
sample, use it to calculate (7’—@,)/op and compare the resulting 
value with 7a, —7a OF T4/y, a8 the case may be (the level of signi- 
ficance being approximately a). Ifop be known á priori, the confi- 
dence limits for @ will be the observed values of 


T — Tanor and T+teigop: ee (19.2) 


The confidence coefficient will be 1—a approximately. 


In case op is unknown, one can substitute its sample value, say, 
ép and still make the test for Hy by taking 


(T—4) [dp eee (19.3) 
as approximately a normal deviate. 


APPROXIMATE TESTS AND CONFIDENCE INTERVALS 545 


The confidence limits to @ will now be the observed values of 
T—rajg6p and T+74)96p- vs (19.4) 


A question that naturally arises is: Precisely how large should 
n be for such approximations to be valid? An answer to this 
question will depend on the nature of the population from which 
the samples are being taken, on the nature of the statistic and, of 
course, on the degree of accuracy aimed at. Some practical rules 
may, however, be suggested. In case one is dealing with sample 
means or sample proportions, such approximations will usually 
be good if n>30. In the case of sample medians, variances, coeffi- 
cients of skewness and kurtosis, correlation coefficients (population 
correlation being in the neighbourhood of zero), it is necessary that 
n should be at least about 100. For sample correlation coefficients 
when population correlation is considerably different from zero, 
it is found that even samples of 300 do not give satisfactory 
approximation. 

Example 19.1 For 150 beans ofa particular variety, the mean 
and standard deviation of breadth of bean were found to be 
z=8-512 mm. and s--0°616 mm. Test if the observed mean differs 
significantly from 8 mm. 

The null hypothesis here is H,:»=8, where p is the mean 
breadth per bean in the population. Since the sample size in this case 
is quite large, on the assumption that the n=150 observations form 
a random sample, an approximate test for H, will be provided by 
the statistic V/n(z7—8)/s, which is, under Ho, approximately a normal 
deviate 7. For the given sample, 

t= 150(8-512—8)/0-616 


_ 12:247 x 0512 
0616 
=10:179, 

In the present case, we are to use a two-sided test since the 
alternative hypothesis is Æ : #8, Since 10-179 is greater than 
7.os= 1-960 as well as T-995=2'576, the null hypothesis is to be 
rejected. The observed mean thus differs significantly from 8 mm. 


ve (1)—35 


546 FUNDAMENTALS OF STATISTIOS 


Example 19.2 With the data of Example 19.1, we may also have 
confidence limits for the unknown population mean. If the chosen 
confidence coefficient is 0:95, then the corresponding limits will 
be approximately 

¥—1-960 s/V/n=8-512— 1-960 x 0-616/V150 
=8:512— 1-960 x 0-616/ 12-247 

=8'512—0-099—8-413 mm. 

and #+-1-960 s/+/n=8'512+ 0-099—8-611 mm. 
19.2 Tests and confidence intervals for proportions 

Suppose in a population p is the proportion of members with a 
characteristic A. Ifa random sample of size n be drawn from this 
population, the n drawings being mutually independent, and if we 
denote by f the number of members of the sample who possess the 
characteristic A, then the sampling distribution of f will be of the 
binomial form, with p.m.f. 


(rop 

The exact treatment of such samples will, therefore, necessitate the 
use of the binomial distribution. However, if n is sufficiently large, 
we can use, instead, the normal distribution. For, as we have stated 
earlier (in Chapter 10), a binomial distribution tends to the normal 
form for large n, provided p is not very nearly equal to zero or 
unity. Asa working rule, one may insist that p should lie between 
9/n and (n—9)/n. (The reason will be apparent from what we say 
in Section 19.6.) 

Since f has mean np and variance np(1— $), one may, therefore, use 

(F~ np)/Vnp(1—p) = Vv n(b—p)/V p(1—p)» 

where p=f/n is the sample proportion of A, approximately as a 
normal deviate. 

Suppose it is required to test the hypothesis H : p=pzy at the 
level of significance æ. Under this hypothesis, 


Vn(b—po)/'V P= po) « (19.5) 
is approximately a normal deviate. To test the hypothesis one will, 
therefore, compute the above quantity from the given sample. (a) If 
the question is whether p is equal to p, or greater, H, will be rejected 


APPROXIMATE TESTS AND CONFIDENCE INTERVALS 547 


if the computed value exceeds 7, (and it will be accepted otherwise). 
(b) Secondly, if one is interested to know whether $ is equal to pa 
or smaller, Hg will be rejected when the computed value is found.to 
be smaller than —7,. (c) When the question is whether p is or is 
not equal to po, the hypothesis will be rejected if the computed value 
exceeds 7,/, or is smaller than —r7,)9- 

Next, in order to get a pair of confidence limits for p with confi- 
dence coefficient 1— a, we see that although the exact variance of f is 


pl- =?) 


, ifn is large enough, p in this expression may be replaced by 


1 
A) sample value $. Hence we have approximately 


>| YARED r |=1-< 


VAU) 
on Pran Y PO= pp Bran Ki-phj= ia, 
Hence for the given sample the observed values of 
-ran VPO Dn P+ran VOA = (196) 


will be confidence limits. to p with confidence coefficient approxi- - 
mately equal to 1—«. 

| Suppose now that we have two populations, the proportion of A 
being p, in the one and p, in the other. Let random samples of 
sizes n and ng, respectively, be obtained from the first and the 
second population through independent drawings, and let f,=/,/m 
and f,=/,/n, be the two sample proportions of A. We have then 

E($,—$.) =E(f,)—E\ ps) =pi—Po 
and var(p, —f,) =var(p,) +-var( pe) 
All =) poll =p), 
n ng 
Further, as is obvious from the preceding discussion, f, —ĝ, will also 
be approximately normal when n; and n, are sufficiently large. 
Hence in that case 
— (bi=be)— (hip) 
yeiz alipe 2) 


will be distributed as approximately a normal deviate. 


548 FUNDAMENTALS OF STATISTICS 


Consider then the hypothesis 
Ho: Ps=pe- 
According to this hypothesis, 
‘ E (At — $) =0 
Mou} 
and var(},— $y) =Pp(1 +e} 


where p is the common value of fı and pa. If p were given by the 
hypothesis, one would, therefore, use 


ron —_Li-h ws (19.7) 
PENEN: ET 
1—p)} -42 
MEDET 
for testing H,. 3 
. But here, as is usually the case, p is unknown and has to be 
estimated from the data. The proper estimate will be the proportion 
of A in the two samples taken together, i.e. 
path, 
mM, 
To test H, one would, therefore, compute from the samples 


SNES aed a ve (19.8) 
Mi-ai 


and compare it with the appropriate tabulated value of the normal 
deviate for the acceptance or rejection of the hypothesis. 

Example 19.3 An antibiotic is claimed to cure at least 90% of 
cases of tuberculosis. Eighty T. B. patients are treated with the anti- 
biotic and out of them 59 get cured. Do you consider the claim to 
be justified ? 

The null hypothesis in this case is Ho : p=0-9 (where p is the 
proportion of patients whom the antibiotic is expected to cure), 
which is to be tested against the alternative H : <09. Under the 
usual assumptions, the test is given by the statistic 

Vin(b—0-9) 

V09x01’ 
which may be supposed to be approximately a normal deviate under 
Hy, since here n is fairly large. 


APPROXIMATE TESTS AND OONFIDENCE INTERVALS 549 


For the given sample, 


p= —0: +7375. 
Hence 

_ V80(0-7375—0- 0-9) __ 8-944 x 0:1625 

V0-9x0°1 03 
= —1-453/0:3=— 4-843. 
This is smaller than —7r.9.=—1:645 as well as —r.9,=—2:326. 
As such, the null hypothesis is to be rejected ; i.e., the claim that the 
antibiotic cures at least 90% of cases of T. B. does not seem to be 
justified in the light of the data. 
Example 19.4 An investigation of the performance of two 

machines, in a factory manufacturing large numbers of bobbins, 
gives the following results : 


No. of bobbins No. of bobbins 

examined found defective 
Machine 1 375 17 
Machine 2 450 22 


Test whether there is any significant difference in the performance of 
the two machines. 

The two machines may be said to be significantly different in 
their performance if the proportion of defective bobbins for Machine 1 
is different from the proportion of defective bobbins for Machine 2. 
Thus we have to test the null hypothesis Ho : p,=p, against the 
alternative H : pj py. 

Here the sample proportions are 


tion 17 =U" 
n I3 0:04533 
f = 22 = 0:0480. 
and ne 450 0-048839, 
while the pooled proportion is 
174.22 


fith 
Eer 375 +450 -=B 


Hence 
Enn] 


9 004727. 


550 BUNDAMENTALS OF STATISTIOS 


which is approximately a normal deviate under Hp, has the value 
0-04533 —0-04889 


y/o 04727x 0: a as 


i 000856, 000856 _ nao, 
-V0:00022018 001484 


Since the observed value of the normal deviate is numerically 
smaller than both 7. o0s=2°576 and 7 pas=1'960, the null hypothesis 
ought to be accepted. In other words, the given figures do not 
indicate any significant difference in the performance of the two 
machines. 


19.3 Approximate tests and confidence limits for Poisson 
parameters 
Lettisk »*, be a random sample from a Poisson distribution 
with unknown paramet-r À, An approximate test or confidence 
interval for À can be obtained from the fact that the sufficient 
statistic 
I= Fri 


is approximately normally distributed with mean and variance both 
equal to nd, provided nÀ is sufficiently large. 


A test for 
H à= 
will then be given by the statistic 
(=n) / V no» 


which is approximately normally distributed under H. 
Similarly, the fact that approximately 


P| -ran <2 ET i—« 


will provide us with confidence es to A, the associated confidence 
coefficient being approximately 1—a. 

Again, we may be interested in a comparison among k Poisson 
distributions with unknown parameters A, (i=1, 2, ...... ci ae 
K AAE e » k) be the totals, for the k distributions, of indepen- 


APPROXIMATE TESTS AND CONFIDENOE INTERVALS 551 
dent random samples taken from them, then 
isp 
Ri 
is approximately a x? with df=k. Hence to test for the hypothesis 
Ha : Ài =)= erres =A,, 


we could u;e the statistic 
egli (with df=k) (19.9) 
i i 


if the common value À were given. But À will in most cases be 
unspecified and will have to be estimated from the data, The 
maximum-likelihood estimate is 


k ok 
A=¥ yi] Ze 
=1 isi 
If we replace À by Â in (19.9), we may still use 
yim plat (with df—=k—1) «+. (19-10) 
nA 
for an appropriate test for H,. This is distributed approximately 
as a x* with df=k—1, the loss of one degree of freedom resulting 
from the estimation of A by A, 
Example 19:5 In a study relating to the traffic’ conditions ina 


city, the average daily numbers of motor car accidents during April, 


1962 were found to be as follows : 
PRPC Ea ee. See eae eran tore ee 


Zone Average daily number 
North 17 
East 13 
South 10 
West 12 
Central 14 


Do you think that the traffic problem is equally acute in all five 
zones ? 


552 FUNDAMENTALS OF STATISTIOS 7 


Denoting the average daily number of accidents in zone i by x;, we 
see that 30r;, which is the number of accidents for the whole month, 
may be supposed to be a Poisson variable, say with parameter ),. 
The hypothesis to be tested is 
Gad Hi: y= Age is =p. 

X(30x;—303)*/302— 305 (x,— 3)?/2 


= ryt Sat) 
x4 

is approximately a x? with df=4. 

For the data, this statistic has the value 

ig g[898-—5 x (13-2)?] =50 268 60.909. 

Since this exceeds both X.05,4=9'488 and x*.,, .=-13-277, H, is to 
be rejected. Hence the traffic problem does not seem to be equally 
acute in all five zones. 


19.4 Approximate standard error formul2 of some statistics 

The formule for standard error as well as those for expectation 
for sample mean and sample proportion, which we have already 
come across, are exact and ofa simple nature. When we Pass on to 
other statistics, we find that the exact formulz are of a complicated 
nature and in some cases are difficult to derive. In the large-sample 
case, however, we need not know these exact expressions : approxi- 
mate formula serve our purpose, 

We give below such approximate formule for the standard errors 
—rather, for the vatiances—of some of the more important statistics. 
As regards their expectations, it may be noted that the the expectation 
of every statistic considered here may be taken to be approximately 
equal to the corresponding parameter. (In the derivation of these 
formule, it has been assumed that the available observations are 
a random sample from the relevant distribution.) 

These results stem from the fact that if Dy Tet > 7, are k 
Statistics and 6,, by, ...... » 9, are the corresponding parameters, then 
for.a sufficiently well-behaved function WE Toy vases , Ty), one has 


PT sy Bap irn Fa) Sb (Oyy By, concn, )+ 50-0) (Sh) 


APPROXIMATE TESTS AND OONFIDENOE INTERVALS 553 
in the neighbourhood of the point : 7 ,=0,, Pb ...... :T,=0, 
($ an being the value of the partial derivative of y with respect 


to T; at this point. 
Hence if E(2,)~ 6, for cach i and var(T;), cov(7';, T;) are 
O(1/n)*, then 


EOD Direna Tr] Wp, e Ox) (19 Ta) 
and —var(#(Z',, Ta s- Ty) > (2 *var(T,) 
aT, e 


+2 Bsn), (an, jcov(Lin Ti). +++ (19.11) 


Central moments: If m, be the rth central moment of the sample 
and u, the corresponding population moment, then 


var(m,)~ (uy, SH T rper- Praat a? a pa) (19.12) 
In particular, for the sample variance s?, we have 
var(s*?) = bat Ho?) +. (19,134) 
n 
If the population is normal, then p,~-3,%, so that 
var(s?) ~ 2atjn." s- (19.13b) 
If we consider, instead of the sample variance, the sample 


standard deviation s, it may be proved that in the normal population 
distribution case, 


var (s) ~œ o?/2n. we (19.14) 
Sı and gy coefficients : In sampling from a normal distribution, 

var(g,) = 6/n se (19.15) 
and var (ge) ~ 24/n. ss (19.16) 


Coefficient of variation: Let us denote by v and V the sample and 
population coefficient of variation, respectively, of x. If x is normally 
distributed in the population, then 


var(o) = s[i ise): ve (19.17) 


*O(1)n) is a quantity such that lim »xO(1/n) is finite, e.g: 5in-+2/nt, 
n-> © 


554 FUNDAMENTALS OF STATISTICS 


Sample correlation coefficient: If r be the sample correlation of the 
variables x and y which are distributed in the bivariate normal form in the 
population with correlation coefficient p, then 


var(r) =e)" (19.18) 


Sample quantiles: Let z, be the sample quantile of order p of the 
variable x and let {, be the corresponding population quantile, then 


var(z,) = Pere, I. vs (19.19) 


Here x is supposed to be continuously distributed with p.d.f. 
J (x), so that f(¢,) is the probability-density at x={,. 

For the median, p=}. Hence the variance of the sample median 
is, approximately, 


Elf Gull". 


In case x has a normal distribution with variance o?, 
1 
ib id= ae 
Therefore, in sampling from a normal parent, the variance of tbe 
sample median is approximately 
mo? 
Fn . - p ete 
We find incidentally that although both the sample mean and 
the sample median in the case of a normal population have expecta- 
tion p, the population mean, exactly or approximately, the variance 
of the latter is about 1-57 times the variance of the former. Hence it 
is considered advisable to take the sample mean, rather than the 
sample median, as the proper estimate of a (vide Example 16.4). 
The use of some of these formule is illustrated in the following 
examples. 


Example 19.6 The standard deviation of life in hours per bulb 
for a sample of 150 electric bulbs, taken from those produced by a 
factory in a particular year, was found to be 464 hours. The 
standard deviation of the same variable for 175 electric bulbs, taken 
from those produced in the following year, was 653 hours, Test if 
there has been a significant change in the variability of life of bulbs. 


(19.20) 


APPROXIMATE TESTS AND CONFIDENOR INTERVALS 555 


In the present case we have to test (against the alternative 
H : o,40) Hy: ¢,=0,, where o, and o, are the standard deviations 
of life of bulb for all bulbs produced in the first and second a 
respectively. 

Assuming that the variable is normally distributed in each popu-_ 
lation and that the samples are random and mutually independent, 
the test may be performed by means of 

Sise 

l 1 
Tn, In, 
52—s,2 
a2 
my Me 
each of whichis approximately a normal deviate under H,*. 

Method 1. The common value o? of the two population variances 

being unknown, we estimate it by 
SP HS 
ks ny ng ; 

If we compare the two sample standard deviations, then the test 
is given by i 


either 


or 


Sı — Se 


FT" 
N Bn, 2m 
For the given samples, 

s1=464, s,=653, 


150 x (464)2+175x (653)? _ 106,916 x 10° 328,979 
MOER a A EEY SP eck. R 


T= 


so that s= 


and s=574. 


ar Lt E 189 
Here ST T siavo 
574A 350+ 350 
189 
id =—4:184. 
574 x00787 


Since this is numerically greater than both 7..,=1°'960 and 
T-9ng=2'576, H, is to be rejected. 


*The first statistic approaches normality more rapidly than the second and so is 
to be preferred. 


556 FUNDAMENTALS OF STATISTIOS 


Method 2. H, may be tested by means of the statistic 


which is also distributed as approximately a normal deviate under 
H, For the given samples, 


pu. 215,296—426,409 __ 211,118 
328,972/0-024762 328,972 x01574 
=—4077, 


which is almost equal to the value of 7 obtained by using Method 1. 
Thus both the tests lead to the rejection of the null hypothesis. In 
other words, each of the tests indicates that there has been a signifi- 
cant change in the variability of life of bulbs during the period. 

Example 19.7 For 600 beans of a particular variety, the 
frequency distribution of breadth (in mm.) has 

A= —0:128 and ga =0:195. 
Examine if the population distribution may be supposed to be 
normal, 

If the population distribution be really of the normal type, then 
yı=0, y,=0. We should, therefore, test the hypotheses H, : y, =0 
and Hos: ya=0. On the assumption that the observations form a 
random sample, the tests are given by 


ey ? and ey a 


respectively, which are distributed as approximately normal deviates 
under the null hypotheses. For the given sample, 


nn t=— 0-128 x V TO= — 1-28 


d ce z I j= 4 
an e z470 195 x V25=0:975. 
On comparing their absolute values with 
T-oes= 1'960 and F.995==2'576, 


we find that both the hypotheses are acceptable. The population 
distribution, therefore, may be supposed to be of the normal form. 


APPROXIMATE TESTS AND CONFIDENCE INTERVALS 557 


19.5 z-transformation of sample correlations and other trans- 
formations 

It has been noted in the previous section that in random sampling 
from a bivariate normal distribution, the sample correlation r is 
approximately normally distributed about the population correlation 
p with (approximate) variance (1—p*)*/n. 

The sampling distribution of r tends to normality fairly rapidly 
when p is not very different from zero. However, when p differs 
widely from zero, e.g. when p=+0:7, this sampling distribution 
tends to normality so slowly that the use of the normal approximation 
will not be advisable even if n is as large as 100. 

For such values of p, it is advisable to use the transformation 

1, lr 
z=;ln ice 35 (19.21) 
introduced by Fisher. The new statistic z may be assumed to be 
normally distributed even when n is as smal] as 10, although p may be 
widely different from zero. It has been shown that z has approxi- 
mate mean 


taint te ve (19.21a) 
2 |l—p 
and approximate variance 
1 
— ai (19.21b 
-23 ( ) 


One can, therefore, test any hypothesis regarding por get con- 
fidence limits for p by using the statistic 
Vn—=3(z—¢) 
as approximately a normal deviate, even for moderately large values 


of n. 

Other transformations of this type are given by the following 
functions : 

(1) sin~?Vfjn, where f is the observed number of successes in 
a series of n Bernoullian trials with probability of success p, with 


E(sin-*/f]n) = sinn! yP, i.. (19,22a) 
var(sin~*Vf[n) œ Ł ; vee (19,22b) 


558 FUNDAMENTALS OF STATISTIOS 


(2) Vx, where x is a Poisson ‘variable with parameter À 
(assumed large), with 


EV) =V vss (19.23a) 
var(Vx)=}5 +s. (19.23) 


(3) Ins?, where .* is the sample variance in sampling from a 
normal distribution with variance o*, with 
E(Ins*) = Ino’, ... (19.24a) 


var(Inst)~>. s+» (19.24b) 


These transformations have a two-fold merit. First, the trans- 
formed statistic tends to normality much more rapidly than the 
original statistic. Secondly, the transformed statistic has an asymptotic 
variance which is independent of population parameters, thus pro- 
viding a better test or confidence interval than the original statistic. 


Example 19.8 In Example 12.1 the correlation coefficient between 
marks in statistics Hons. in college test and those in the subsequent 
university examination, for 20 students, was found to be 0-727. What 
can be said about the population correlation coefficient ? 

Let us assume (a) that in the population the two variables are 
jointly normally distributed and (b) that the 20 observations, on 
which the observed correlation coefficient r is based, are a random 
sample from the bivariate normal distribution. We can then have 
confidence limits for the population correlation p. 

We have, approximately, 


Pl—tges < Vn—3(z—L) S Teoy5] =0'95 


P| z— a _7 028 _| 0.95, 
+s [= Filly Sb c+ 7h] 088 


Hence the 95% confidence limits for { are 


z— 75. and Tos, 
rior Wee 4 


For the given sample, 
= (log 1-727 —log 0-273) In 10 
= $(0-23729— 1-43616)2-3(-259.=0-92234, 


APPROXIMATE TESTS AND OONFIDENOE INTERVALS 559 
/ 


so that the 95% confidence limits for £ are 
1-960 1-960 
0-92234—__=—— —0:9 oh 
Vit 0-9223+ 41954 
==0-92234—0:47537 =0:44697 


and 0:92234 +.0:47537 =1-39771. 
Now, į 22 =antilog(2¢ log e) =A, say, or P=saT" 
When £=0-44697, A=antilog(2 x 0-44697 x 0-43429) 
=antilog(0-38823) =2-445 
and p=1-445/3-445=0-419, 
When £=1:39771, A=antilog(2 x 1-39771 x 0-43429) 
=antilog(1-21402)=16°369 
and p=15:369/17 369=0-885. 

The 95% confidence limits for p are, thereiore, 0:419 and 0 885. 

Example 19.9 The correlation coefficient between head-length 
and head-breadth is 0:324 for 90 Brahmins and is 0-278 for 130 
Chattris. Test whether the two values differ significantly. 

Denoting the correlation coefficient between the two characters in 
the population of all Brahmins by p, and the corresponding coefficient 
in the population of all Chattris by pg, we have here as our null 
hypothesis H, : p,=py Which is to be tested against the alternative 
H: pyApy ) 

We shall assume that in each population the two characters are 
distributed in the bivariate normal form. Further, the n; and ny 
pairs of sample observations taken from the two populations will be 
supposed to form two independent random samples. 


1 ` 
If we put z= 1m, +2 and aegne r, and r, being the 


sample correlations for samples taken from the first and second popu- 
lations, then an approximate test for H, is provided by the statistic 
act : 
Vastan 3 
which is approximately a normal deviate under Hy. 


560 FUNDAMENTALS OF STATISTICS 


For the given samples, 
zı™= 4 (log 1:324—log 0-676)2-30259 
= 4 (012189 —1'82995)2°30259 —0-336! 1 
and 2. -=4 (log 1-278 —log 0-722)2-30259 
=} (010653 —1-85854)2-30259=0-28551, 
so that 
= 0-364. 


Since the value exceeds neither + 99,=2:576 nor 7.9,,=1-960, 
the hypothesis is to be accepted. In other words, the difference 
between the two sample correlation coefficients is to be regarded as 
insignificant. 


19.6 Frequency x? 

Suppose a population consists of : mutually exclusive classes, the 
proportion of members falling in the ith class being p; i=1, 2, ......, k. 
This classification may be with respect to either an attribute or a 
variable. (In the case of a continuous variable, the classification wil! 
necessarily be artificial, being achieved by dividing the whole range 
of the variable into k arbitrarily defined intervals.) Obviously, 

k 
2 b= 15 

Ifa random sample of size n be drawn from this population, the 
drawings being niutually independent, then the probability that f of 
the members of the sample will belong to the first class, fj to the 
second, ...... > fx to the last is 

JbA Shr pitt. vee (19.25) 
For k=2, it defines a binomial distribution. In the general case 


the distribution is called a multinomial distribution. Note that (19.25) 
may be expressed in the form 


f f. if 
exP(=npi) (npa) * exp(—nhs) (npa) 3, xP npa) (npr)! * 
f x fal x AAEE Alay 


exp[— (np, +p. + Ps Enea) TEET : 
n! ` 
s+) (19.25a) 


~~ 


= 


7 
APPROXIMATE TESTS AND OONFIDENOE INTERVALS 561 


It is known that'the sum of k mutually independent Poisson 
variables, say, x; (i=1, 2) s.i k) with parameters à; is itself a 
Poisson variable with iver Zde Hence in the form (19.25a) 


the multinomial distribution Kaeiener as the conditional distribution 
of k independent Poisson. iyariables fy, fiye » fe, under. the 
condition 

Sitfat en- tfi=n. a (19.26) 
Now, it is also known that a Poisson variable. with parameter A 
tends to normality if A-+ oo, In the present case, the variables are fi, 
with parameters np- Hence for each i, In 


(Si npi)| V np F 

is approximately a normal deviate if np; is sufficiently large. Thus, 
approximutely, 

ž Lei" g (imp ve, (19.27) 

i=1 V np; z np; s 
comes out to be the sum of squares of & normal deviates. The 
approximate normal deviates are, however, subject to the linear 
constraint (19.26), which may be written in the form 

EV np; fi—npi\ <9, .. (19.26a) 

i V np; 

In consequence, (19,27) will have approximately the xê distribu- 
tion with df=k—1, provided the theoretical frequencies np; are 
large enough. In statistical literature, (19-27) is referred to asa 
Pearsonian x? (after Karl Pearson) or a frequency Xè. 

For computational purposes, one may use the following simplified - 
form of (19.27) : 


Bile ns as (19.27a) 

For k=2, (19.27) becomes 
(fi=nbr)* y ( fo—nps)* 
npr Nh,» 


TE E 1, lL) since fi+fh=np npn 
(Anp np, bk os mg 90 that PS Sa~Mhs) 


vn ca! | 
se sh 1) ee 


ya (1)—36 


i 


€ 


562 FUNDAMENTALS OF STATISTICS 


filn being the sample proportion for the first class. This is approxi- 
mately a x? with df=1, a fact of which we are already aware, since 


Val filn—pr) 
Vpx(1—Ps) 
is known to be approximately a normal deviate. 

It has been stated above that for this x? approximation to be 
valid, the theoretical frequencies np; should be sufficiently large. 
As a working lower limit, we may take 5, since both practical and 
theoretical investigations show that the approximation is usually 
satisfactory if np; > 5 for each i, provided the number of classes is 
also greater than or equal to 5. If the number of classes is smaller 
than 5, it is advisable to have each of the expected frequencies some- 
what greater than 5, When it is found that for some class np;»is_ less 
than 5, one should amalgamate or coalesce this class with one or 
more of the adjacent classes so as to make the theoretical frequency 
in the combined class greater than or equal to 5, The number of 
degrees of freedom will then be: 

(number of classes after coalescing) — 1. 

In fact, some recent studies made by Cochran |1] suggest that if 
relatively few expected frequencies are less than 5 (say, just one out of 
five or more, or two out of ten or more), then even as low a value as 1 
is allowable for an expected frequency in using the x? approximation. 

We shall see presently how this statistic may be used to solve 
various problems in hypothesis-testing, 


19.7 Test for goodness of fit: hypothetical population com- 
pletely specified 

In earlier chapters, we considered tests for parametric hypotheses. 
In developing such tests, we made the assumption that the parent 
distribution is of a specified nature (In most cases it was assumed 
that the distribution is of the normal type). We shall now consider 
hypotheses of a more fundamental nature, where these assumptions 
themselves are questioned and one seeks to verify them on the basis 
of sample observations. 

Let us first take up the case where the hypothetical population is 
completely specified, there being no unknown parameter in its 


APPROXIMATE TESTS AND CONFIDENOE INTERVALS 563 


distribution. Let us visualise the population as being composed of-k 
mutually exclusive classes, and let us suppose that, according to the 
hypothesis, the population proportion in the ith class is »,°. If the 
frequency in the ith class in random samples of size n from this 
population be denoted by f;, we find from Section 19.6 that, under 
the hypothesis, iy 

z a a ) -74 =n ve (19.28) 
is approximately a x? with df=k—1, provided np;° is large enough 
for each i.” The xè? statistic, therefore, provides an approximate test 
for the hypothesis. The greater the differences between the observed 
frequencies, f;, and the frequencies expected under the hypothesis, 
np;?, the greater will be the value of (19.28). Hence it would appear 
that a very high value of (19,28) should indicate falsity of the given 
hypothesis, If « be the chosen level of significance, then our test 
procedure consists in the rejection of the hypothesis if in the given 
sample J f2/np?—n exceeds X*,, x-1) and in its acceptance otherwise. 

Since our task here is to see how well the expected frequencies 
np? are in agreement with (or how well they fit) the observed 
frequencies f;, such a test is also called a test for goodness of fit. 

In order that the fit may be considered good, it is necessary that 
the above-stated hypothesis be accepted at a high level of significance, 
say a level like 0-3, rather than 0-05 or 0-01. One should better 
state the probability of the observed value of (19 28) being exceeded 
under the hypothesis to indicate how good (or bad) the fit has been. 

Example 19.10 In the course of an experiment on the breeding of 
peas, a botanist obtained 556 peas, of which 315 were round and 
yellow, 108 were round and green, 101 were angular and yellow, 
and 32 were angular and green, According to a genetic theory, 
such peas should be obtained in the ratios 9:3:3:1, Are the 
experimental results compatible with this theory ? 

If we denote by pı, Pe Py and p, the proportions of peas in the 
four classes in the whole population of peas that may be obtained in 
experiments of this type, then the null hypothesis to be tested is 

Hy : py=9/16, pp =3/16, p=3/16, py— 1/16. 

Let ys assume that the given 556 peas have been taken randomly 

and independently from this population. A test for the hypothesis - 


564 ; FUNDAMENTALS OF STATISTIOS 


H, is then supplied by the statistic Z fon, which is, under H,, 
inbi 


distributed as approximately a y? with df=3. The calculation of 
this x? for the given sample is shown in the following table. 


(1) (2) (3) A 
Class Observed frequency | Expected frequency (2)*(3) 

nn r A S 

Round and yellow 315 312-75 | 317266 

Round and green 108 104-25 111-885 

Angular and yellow 101 10425 97:851 

Angular and green 32 34°75 29°468 

Total i 556 55600 556470 
EENE ATTE ae ea aie Ss D 1 EE SEPRENE OE VEREA.. AEN? PTEE 

Hence 


i x? =556:470—556=0°470, 
The tabulated values are 
X?.95,9=7°815 and X*.o,5=11°345. 

The observed x? being insignificant at both the levels of significance, 
the experimental results seem compatible with the genetic theory, 

Indeed, under the hypothesis Hy, P[x*>0:470]=0-925, so that 
the hypothetical model may be said to have given a very good fit 
to the data. i 


19.8 Test for goodness of fit: some parameters of hypothetical 
population unknown 

This is the more usual form of the problem of testing for goodness 
of fit. It differs from the preceding problem in that now the 
proportions p;? are not completely specified by the hypothesis but 
are dependent on some unknown parameter or parameters. Such is 
the case, for instance, when the hypothesis says that the population 
distribution is of the Poisson or of the normal type, without 
specifying the value of À or those of » and o. 

In the general case, let us suppose that the hypothetical 
population depends on r unknown parameters (r<k—1), which may 
be estimated from the given sample itself. If the appropriate 
estimators of the parameters (e.g. those obtained by the method of 


APPROXIMATE TESTS AND OONFIDENOE INTERVALS 565 


moments) are considered and if the corresponding estimators of the 
population proportions are denoted by 9,°, then 

se -25 =n ve (19.29) 

T nh in 
will still be approximately a y? if nho are large enough. However, 
the number of degrees of freedom will get reduced, for the estimation 
of each parameter imposes a homogeneous linear constraint on the 
(approximate) normal deviates 

(Ainho) Vn: 

The number of degrees of freedom of the above x? statistic will, 

therefore, be ` i 
(k—1)—(number of parameters estimated) =k—r—1. ... (19.30) 

Example 19.11 ïn Example 10.4 a normal distribution was fitted to 
the observed distribution given in Table 6.10, We shall assume that 
the given 177 persons have been taken randomly and independently 
from the population of all Indian adult males. We may then test 
for the goodness of fit of the normal distribution (i.e., we may judge 
whether or not the population distribution of height may be 
supposed to be of the normal type) by means of the x? statistic. 

The computation of the y? is shown in the table below. It may 
be noted that here the first three and the last three class-intervals of 
the variable (vide Table 10.4) have been amalgamated to form only 
two intervals. This has been done in order to make the expected 
frequency in each class greater than or equal to 5, 


eet ee ee ee ae 


154-55 4 5:553 2881 
15455—15955 24 24860 23-170 
15955—16455 58 55-687 60:409 
16455—16955 60 57371 62749 
169°55—17455 27 26-915 
174°55— 4 2-483 


566 FUNDAMENTALS OF STATISTIOS 


From the above table, 
x?= 178-697 — 177 =1-607. 

Remembering that in fitting the normal distribution, two parameters 
had to be estimated from the sample, we see that the x? statistic now 
has df=5—2=3. The relevant tabulated values are X*.5,5= 7-815 
and X*.9;,.=11:345. Since the observed y? is much smaller than the 
tabulated values, since indeed P[y*> 1-607] is, under the hypothesis 
of normality, as high as 0:657, the normal distribution seems to 
have given a very good fit. 


19.9 Test for homogeneity 

Suppose there are / similarly classified populations. Let k be the 
„umber of classes in each population and p; the proportion of the 
jth population in the ith class (i=1, 2, ...... Hy BS EES Be ree si 
The populations may be represented as follows : 


i 
Olas 1 Seka ate aa 1 

1 pu pis fn e oe pu 

2 pu Pn þa e e pu 

3 fu fas Pee case) oes pu 

k Pin pes pa ns te Pei 
Total 1 1 1 1 


When ;; are unknown, one may want to decide if the | population 
distributions may be supposed to be identical (or homogeneous). 
One has then to test the hypothesis 


Ay : Pn=bu =. =Pit- 
for each i 
Let a random sample of size n; be drawn from the jth population 
(j=1,,2, «+++ , 1), the drawings being mutually independent, and let 


the number of members of this sample which belong to the ith class 
be fı We have then 


: 
Zim 


APPROXIMATE TESTS AND OONFIDENOE INTERVALS 567 


The position may be visualised from the following table : 


ae 


Clase f gio gandai Total 
1 Su Jfa Sis ene) jose Su Sio 
2 fa Sin fat e o fa Su 
3 Su Sus Su Sa Sav 
k SAE fees “Sp HO fia Ju 
a ee 
| 


Total | ny m Mig tee ete n n 


In the present set-up, for each j 
£ (fan Pui)” 
2 fats = 
is distributed as approximately a x* with df=k— l. 
Hence 
5 k, (fiy—ni bi? $ f (Fi—njPis)”, 
miisi jhi iSSi Mipi 
being the sum of / independent x”’s each with df=k—1, is itself a X°? 
with df=(k—1)l. According to the hypothesis, therefore, 
pp ne ea J. (19.31) 
a 
where p’ is the common value of pj for allj, is approximately a X* 
with df=(k—1)l. This statistic could be used to test Ha if p;° were 
known quantities. Suppose we replace each p;? by its estimator—the 
proper estimator is the sample proportion obtained by combining 
all samples, viz. 


pe= ul B= 


fit 
n 


where So=% fy and n= Zany Then the frequency y? takes the form 


(ia 


any ptt —n. we 9. 
ie Hi lisse 
n 


568 FUNDAMENTALS OF STATISTIOS ; 


Because of this estimation, the number, of degrees of freedom. will 
get reduced by (s—1)—and not by k, since when k—1 proportions 
are estimated, the remaining në is automatically determined by 
virtue of the property that the sum of all proportions is unity. 

The hypothesis. H, will, therefore, be rejected or accepted 
according as | 


fE l 
"(FE ra 
exceeds X? ttapa) OF NOt, Xa, pk- au- 1) being the upper a-point of 
the x*-distribution with 

df=(k—1)i—(k—1) =(k—1)(/—1). «+. (19.33) 
19.10 Test for independence i 


Let a population be classified according to two attributes, A and 
B, into k and I classes, respectively, say, 


and By, By +--+. s By. 


Let p; be the proportion of members of the population belonging 
simultaneously to the ith class of A (i.e. Aj) and the jth class 


of B (ie. B;). The structure of the population will then be as 
follows : 


The proportions pi; define the joint distribution of A and B. The 
marginal totals 


bio= Day 


APPROXIMATE TESTS AND, CONFIDENOEZ INTERVALS 569 
give the marginal distribution of A ; while the other marginal totals, 


k 
Poj = Z hin 


give the marginal distribution of B. 
When py are unknown, we may enquire whether A and B are 
independent. We have then to test the hypothesis 


Ho : pig=Pin X boj 
my (tor a i, j) 


Let a random sample of size'n be drawn from the population, 
the drawings being mutually independent, If we denote by fi; the 
number of members of the sample that belong to the ith class’ 
of A and to the jth class of B, then, under the hypothesis Hys 

$y uot) onimi! fo perga 
iSt jsi o APiopoi 
will be distributed as approximately:a x? with df =kl—1. This 
statistic could be used to test Hy if py and pyi were known quantities. 
In the present case, however, they are unknown and have to be 
estimated from the sample itself: 

The proper estimator of pis which is the population proportion 

in the class A; is the corresponding sample proportion 
s falt ! 
where fo= JJ Similarly, the proper estimator of po; is 


3 
foil" y ; 
where fj= > fj The structure of the sample is shown below : 
i ’ bs J 


570 FUNDAMENTALS OF STATIBTIOS 


Substituting these estimators for pi and pyy in (19.34), we get the 
new statistic 
2 
ig > Si? (19.35) 
oinin. ai m ... Ps be 
iT Sinfoi nd i SioSoj 
n 


We are using k+l estimators, of which (k—1)+(/—1) are inde- 
pendent. For, given (k—1) of the estimators for pio the other 
automatically follows ; and similarly, given (/—1) of the estimators 
for pap the other is automatically determined. Hence (19.35) is 
distributed as approximately a y? with 

(kl—1) —(k—1) —(I—1) = (kK—1) (I= 1) - sa (19.36) 
degrees of freedom. 

Thus we see that, although the problem discussed here is 
different from that in the preceding section, the solution of each 
is formally the same, the x? statistic used in each case being of 
the form — 


. (class frequency)* P D 
(grand total) Z? row total) (column total) (grand total) 


with df=(no. of rows—1) x (no. of columns —1). 
19.11 Simplified formule 

When either k or | is equal to 2, the expressions (19.32) and 
(19.35) reduce to much simpler forms. 


In the first place, suppose [=2 and k>2. Here the sample 
frequencies and their totals may be represented as follows : 


| 
| 


Total 
T: 
Ts 
Ts 
T 


k 


Total T, Ti 


APPROXIMATE TESTS AND CONFIDENOE INTERVALS 571 


In terms of these symbols, the expression corresponding to (19.32) 


or (19.35) is now 
TiTa? TT? 
xo Bi Bi 


n 
Pa e eb) Pe 
k (0-7 ORT HT 
l- pal wg A CEEE E 
=n T, T, per: T, , since (a, i ) (4 = ) 
n fou glesa a ST 
ran l2 Tr, = am ae i 
os fae a? Tat .. (19.37a) 
vë Ti IT, n |: n ( 
t 
This formula (called the Snedecor-Brandi formula) or its equivalent, 
n? aN _.. (19.37b) 
Ta Talt ti 8 


will be found more convenient for computational purposes. 
In case k and l are both equal to 2, i.e. for a 2x2 table, tbe 


sample frequencies and the totals may be written as follows : 


Riel Inn fo) 2S 
| 
Row i Column 2 | Total — 
1 a b a+b 
2 c d c+d 
E. a 
| n 
| 


Total | ate b+d 


ee eee 
The difference between any observed frequency, such as 
a or b, and the corresponding expected frequency, such as 


(a+5)(a+e) or (a+6)(b+4), is numerically the same for all four 
n n y 

cells of the table. Hence the approximate x? statistic for testing 

for homogeneity or independence (as the case may be) will now be 


b + 3 1 1 1 
nja tato d Toc A] 


Horde +d) 
i 


572 FUNDAMENTALS OF STATISTICS 


1 n? 
AA E TEDE] 
PORN sela a PA LS (19.38) 
(a-+-b)(c+d)(a-+c)(b+d) 


This again will be found much easier to apply than the original 
formula. 


19.12 Yates’ correction for continuity 

We have stated in Section 19.5 that, for the validity of the x? 
approximation, it is necessary that the expected frequency in each 
class should be sufficiently large, say at least as high as 5. When 
some expected frequency is too small, we coalesce some of the 
classes in order to satisfy this condition. However, it. should be 
apparent that this procedure is ruled out in the case of testing for 
homogeneity or independence in a 2 x2 table. 

Yates has suggested a correction to be applied to the observed 
frequencies in a 2x2 table in case any expected frequency is found 
to be too small. This consists in increasing or decreasing the observed 
frequencies by }, in such a way that the marginal totals remain 
unaltered. If we consider the 2x2 table of Section 19.11, then it is 
necessary to increase a and d by 4 each (while each of b and c is 
to be decreased by }) in case ad < bc. On the other hand, if ad > be, 
then a and d have to be decreased by $ each, while each of b and c 
has to be increased by the same amount, Since the marginal 
totals remain unaltered, obviously the expected frequencies are not 


to be changed. 
If Yates’ correction is applied, the formula for x*, corresponding 
to (19.38), will be 
BA i a a 
(a+b)(e+d) (a+e)(6+d)" 7 Vent 


Example 19.12 The breakdowns occurring during a year for each 
of 4 machines in a factory were classified as follows according to 
shift, there being 3 shifts daily. Judge whether the differences 
among the four distributions may be attributed to sampling 
fluctuations alone. 


APPROXIMATE TESTS AND OONTIDENOE INTERVALS 573 


Machine 
Shift i $ i j 
SN A a A daga A 
1 | 15 9 18 20 
2 | 16 18 29 31 
| 19 15 19 27 
r a Daca Om pee | 


Total l 50 42 66 78 


aee i e 


Here we are to test for the homogeneity of the four frequency 
distributions corresponding to the four machines. Assuming that 
the required conditions are satisfied, we may apply the x?-test, with 

> fr - S 
x=n(3 2i Ar i), df=3x2=6. 

For the given samples, fj; and the products fio fo; are shown in the 
table below, together with 


fe 


keen Routines Row tota! ea: total bee 
15 | 3,100 0-07258 
9 2,604 003111 
18 4,092 0:07918 
20 4,836 0-08271 
16 4,700 005447 
18 3,948 008207 
29 6,204 013556 
31 7,332 013107 
19 4,000 0°09025 
15 3,360 006696 
19 5,280 006837 
27 6,240 011683 - 


Total 236 -= } 101116 


374 FUNDAMENTALS OF STATISTIOS 


Hence 
x? =236(1-01116—1) =2-632. 

On comparing this with X~.9,,,=12°592 and x? ,,,=16°812, it is 
seen that the hypothesis of homogeneity is to be accepted. That is 
to say, the observed differences among the four distributions may be 
attributed to sampling fluctuations alone. 

Example 19.13 Eighty-eight residents of an Indian city, who were 
interviewed during a sample survey, are classified below as male 
or female and also as drinkers or non-drinkers of tea. Do these data 
reveal any association between sex and drinking of tea ? 


Male Female Total 
Drink tea 40 33 73 
Do not drink tea 3 12 15 
Total 43 45 88 


In order to test for the null hypothesis that the two attributes are 
independent in the population of residents, we shall use the x?-test: 
For the present data, 

x= (ab—be)?n r 
(a+b)(e+d)(a-+e)(6-+d) 
L _(381)*x88 12,774 
73x15x43x45 2,1188 
Since two of the expected cell-frequencies are rather small, it would, 
however, be proper to use Yates’ correction for continuity. On 
applying this correction, we have 
2 __{|ad—be| —n/2}*n 
(a+b)(c+d)(a+c)(b+d) 
— (337)? x 88 9,994+] 
73x15 x43x45 2,1188 
The tabulated values are 
X? 95,1. = 3°841 
and X.91,1=6°635. 


If we choose the 5% level of significance, then the null hypothesis 
should be rejected ; i.e., the two attributes should be taken to be 
associaied in the population. It may be noted that although both of 
the computed x? values are\significant at the 5% level, the non- 
application of Yates’ correction grossly exaggerates this significance. 


=6-029, 


x 


=4-717. 


APPROXIMATE TESTS AND CONFIDENOE INTERVALS 575 


Questions and exercises 


19.1 What are large-sample approximations in statistical 
inference ? Comment on their merits and limitations. 

19.2 Describe large-sample tests for one and two binomial 
proportions, and one and two Poisson means. 

39.3 (a) Supposing that r; (i=1, 2, ...... , k) are sample corre- 
lations for independent random samples of sizes n; (i=1, 2, ...... , k) 
from a bivariate normal distribution with unknown correlation p, 
indicate how you would combine the k sample correlations to get an 
estimate of p. Partial ans. Estimate of p to be obtained from 


z= R(n —3)z Z (m —3). 

(b) Suggest a test, based on the z-transformation, for the 
hypothesis that k bivariate normal distributions have the same 
correlation coefficient. Partial ans. x= B(m—3) (zi— zo)? 

= Z(m—3)z?@ (2 (m— 3) 2312 (m—8) with df=k—1. 
19.4 What is a Pearsonian x? statistic ? Describe its important 
uses in statistical inference. 
19.5 Suppose two drugs are administered to each of n patients 
suffering from headaches. The reactions are summarised in the 
following table : 


Starting from the marginals a+b and a+c, show that a 
comparison of the two drugs can be made by means of the statistic 
{6—c)*/(b+-c), which is approximately a x? with df=1. 

19.6 Suggest a suitable large-sample test for comparing the 
probabilities, say p, and p, corresponding to two categories ofa 
population with k(>2) categories. (The sample frequencies fp, i=1, 
athe f k, may be assumed to be distributed in the multinomial form.) 


576 FUNDAMENTALS OF STATISTICS ` 


[Hint: E(f;)=npi var(yfi)=npd lp) and cov( fi, fi) = — "Pi Pir 
for ix.) 

197 Writing p;=a;/7;, p=T- ajn= ZT spd T; and gulp, show 
that the formula for x? in the 2xk case (vide Section 19.11) can be 
reduced to the form 


f ieit i an : 
x Umit yo ipe—np}- 
Deduce also a third alternative form ; 
Sai aj, bi s k 
earn zi r) [7] 


._ 19.8 Suppose a linear combination of independent y? statistics, 
r ele, Chee sXe? (with df’s vy, Vgs se. Ve, Tespectively)—say, Zax’, 


_ where a; > 0 for each i—is to be approximated with a new statistic of 
the form ax? with df=» (say). What should be the values of a and » 
in order that the mean and variance of the approximating statistic 
may be the same as those of the original statistic ? 
Ans, a= Zani] Zawi viata) Za vi 
19.9 A manufacturer of watches claims that not more than% 
per cent of his product is defective. A retail dealer buys a batch 
ot 720 watches from the manufacturer and finds on inspection that 
26 of the watches are defective. Would you consider the manu- 


facturer’s claim justified by the data ? Partial ans. 17=3-088, 
19.10 With the above data, obtain 95% confidence limits for the 
true. percentage of defective watches. Ans. 225% and 497%. 


19.11 A random sample of 826 students taken from Calcutta 
colleges in 1950 contained 143 women, while a random sample of 
1,214 students taken in 1960 included 385 women. Examine whether 
there has been a significant progress among women in respect of 
collegiate education. Partial ans, t=—7 291. 

19.12, A-sample of 80 pigs was given a certain diet, and the gain 
in weight over a period of 20 days was recorded for each pig. The 
sample mean and the sample standard deviation came out to be 
32°12 lb. and 11-59 Ib. Test whether the corresponding population 
values may be supposed to be 30 Ib. and 10 Ib., respectively. 

Partial ans. (for mean) = 1-636 ; r(for s.d,)=2-011, 


APPROXIMATE TESTS AND CONFIDENOE INTERVALS 577 


19.13 The correlation coefficient between brother's height and 
sister’s height for 53 brother-sister pairs was found to be 0-585. Is this 
coefficient significantly smaller than 0:8? Partial ans. r= —2:938. 

19.14 The correlation coefficient between sitting height and 
stature was found to be 0-7854 for a group of 70 adult Europeans. 
For a group of 39 adult Indians, on the other hand, the coefficient 
was 0:5209. Do the two coefficients differ significantly ? 

Partial ans: 7=2°332. 

19.15 Avsix-faced die was thrown 300 times, and the number of 
points obtained at cach throw was recorded. In this way the 
following frequency distribution was formed, Use these data to test 
whether the die was unbiassed, 


Number of points 
per throw 1 2,38 4 5. £36 
Frequency | BE 52. 48240. See 77 


Partial ans. y®=24°52, 

19.16 The following table gives the number of weed seeds in 196 

l-lb. packets of a variety of pulses and also the frequencies of the 
different classes as obtained by fitting a Poisson distribution. 


Number of | Observed | Expected 
frequency | frequency 


weed seeds 


eon onvuvur wrdre co 


10 or more 


Test for the goodness of fit. Partial ans. x*=5-039. 


sa (1)—37 


578 FUNDAMENTALS OF STATISTIOS 


19.17 1,072 schoolboys were classified according to intelligence, 

and at the same time their economic conditions were recorded. The 

results are shown in the following table. Judge whether there is any 
association between intelligence and economic conditions. 


O 


Economic Intelligence 
conditions Excellent Good Mediocre Dull 
SEEE E pe espe ee redeem 
Good 48 199 181 82 
Not good 81 185 190 106 


L Die eta a ae erg et E 

Partial ans. x*==9°735. 

19.18 During a smallpox epidemic, the following data were 

collected on the basis of a survey of 222 persons vaccinated against 

the disease. Do you think that the standard of vaccination affects 
the power to resist the disease ? 


Attacked with smallpox Not attacked Total 


Well vaccinated 33 120 153 
Badly vaccinated 18 51 69 
Total 51 171 222 


Partial ans. X?=0°549. 
19.19 JANA that the data of Exercise 11.12 correspond to a 
random sample from an infinite population, test for the mutual 
independence of A, B and C. Also, test whether B may be 

considered independent of A and C, taken jointly. 
Partial ans. X*’s have df’s 4 and 3. 
19.20 There are two sections in a class, having 120 and 100 
pupils, respectively. The following table gives their results in the 

half-yearly and annual examinations : 
Section I Section II 
Half-yearly exam. Half-yearly exam. 


| Pele 
Annual 
exam. Passed) 48 | 
railed 8 | 


APPROXIMATE TESTS AND CONFIDENCE: INTERVALS 579 


(a) For each section, test if the annual exam. results have any 


association with the results of the half-yearly exam. 


Partial ans. X,*==53-571, y, =42:739. 
(b) Test whether the two sections may be regarded as random ` 


samples from the same population. Partial ans. x?=11:371 (df=3). 


SUGGESTED READING 


{1] Cochran, W. G. “The x? test of goodness of fit”, Ann. Math. 


Stat., 23, pp. 315-345, 1952. 


[2] Cochran, W. G. “Some methods of strengthening the common 


x* tests”, Biometrics, 10, pp. 417-451, 1954. 


[3] Fisher, R. A. Statistical Methods for Research Workers (Ch. 4): 


[4] 
[5] 
(8] 
{7} 
[8] 


[9] 
{10} 


Oliver and Boyd, 1954. 

Goulden, ©. H. Methods of Statistical Analysis (Chs. 15, 16): 
John Wiley, 1952, and Asia Publishing House, 1959. 

Irwin, J.O. “A note on the subdivision of x? into compe 
nents”, Biometrika, 36, pp. 130-134, 1949. 

Kimball, A. W. “Short-cut formulas for the exact partition 
of x? in contingency tables”, Biometrics, 10, pp. 452-458, 1954. 
Maxwell, A. E. Analysing Qualitative Data (Chs, .1—7). 
Methuen, 1961. hy 
Mood, A. M., Graybill, F. A. and Boes; D.C. Introduction to 
the Theory of Statistics (Chs. 8,9). McGraw-Hill, 1974, and. 
Kégakusha. 

Rao, ©; R. Adva «ed Statistical Methods in Biometric Swen 
(Chs. 5, 6). John Wiley, 1952. 
Yule, G. U. and Kendall, M. G. An Introduction to the Theoryif 
Statistics (Chs. 17—20). Charles Griffin, 1950, 


2 0 NON-PARAMETRIC 
METHODS 


20.1 Introduction 

Most of the standard statistical techniques are optimum under 
certain standard assumptions, e.g. independence, homescedasticity 
and normality. 

Statistical methods have been called robust by G.E.P. Box if the 
inferences made by using them are not invalidated by the violation 
of the underlying assumptions, It is customary to justify the use of 
a normal theory criterion, in a situation where it cannot be guaran- 
teed, by arguing that it is robust under non-normality. A fair 
number of enquiries have been made into the behaviour of standard 
tests when something other than the standard assumptions hold. 
When a remedy is required for non-normality, a procedure available 
is to transform the original variable. But this transformation must 
be based on some assumed form of the population distribution. 

Non-parametric. methods are concerned with the treatment of 
standard statistical problems when the assumption of normality is 
replaced by general assumptions concerning the distribution function. 
Frequently, the variables are just assumed to come from a continuous 
distribution. Non-parametric methods, including K. Pearson’s x*-test 
for goodness of fit, rank correlation and methods based on order 
statistics provide a means of avoiding the normality assumption. 
But non-parametric methods do nothing to avoid the assumption of 
independence or that of homoscedasticity. 

Another term which has been freely interchanged with the term 
non-parametric is distribution-free. But it is better to maintain a distinc- 
tion between the two as these are not synonymous. A statistical 
problem is parametric or non-parametric depending on whether 
we allow the parent distribution to depend on a finite number 
of parameters or leave it to be quite general, say just continuous. 
In other words, these terms depend on the formulation of the 
problem. On the other hand, if the method used to solve the 

580 


NON-PABRAMETRIO METHODS 581 


problem depends neither on the form of the parent distribution nor 
on its parameters, then the procedure is said to be distribution-free. 
Ifthe method does not depend on the parameters of the parent 
distribution but depends on the form, we may call it a parameter-free 
procedure. Thus both parametric and non-parametric problems 
may or may not be distribution-free. Distribution-free procedures 
were devised primarily for non-parametric problems. Hence we 
find these two terms being used interchangeably. 

Non-parametric methods have certain advantages in that they 
require few assumptions, are simple to compute and can be used 
even in situations where actual measurements are unavailable and 
the data are obtained only as ranks. 


20.2 Non-parametric estimation of location and dispersion 
In non-parametric theory, the most frequently used measure of 
location is the population median 6, which is any real solution of 


the equation 
F(6)=0'5, 


where the distribution function F(x) is supposed to be continuous. 
We assume that a unique solution exists, as would be the case for 


a strictly increasing F(x). The measure of dispersion used in non- 


parametric theory is the interquartile range, 


banbu 


where p, defined by F(l,) =f) is the p-quantile or quantile of 
order p. (Thus the median is £4)2-) : 


Point estimation ) 

A point estimate of the p-quantile Cy is given by the correspon- 
ding sample quantile. 

Let x) be the rth smallest sample observation in a sample of size 
n, also called the rth order statistic. Then x,) isa point estimate of 
the r/(n-+-1)-quantile. If a population quantile relates toap lying 
between r/(n+1) and (r+ 1)/(n+ 1), then its point estimate is obtain- 
ed by linear interpolation from xir) and Zerga Thus the sample 
median is a point estimate of population location and the sample 


interquartile range is a point estimate of population dispersion. 


582 FUNDAMENTALS OF STATISTIOS 


n 


Interval estimation 

Interval estimates for population quantiles can be easily obtained 
with the help of the binomial distribution. 

Note that the probability is p for an observation to fall to the 
left of £, and-(1—p) for an observation to lie to the right of {p 
Then 


Pl) > fy) -2 (‘) pi(l—p)*-# se (20.1) 
and Pito <Ep)= 3 (ap. ve (20.2) 


We get the binomial probability (20.2) by noting that in n indepen- 
dent sample values with P[X< fp]=p, we have x,,)<f,, if and 
only if at least r of the sample values are less than or equal to {,. 
(20.1) and (20.2) define one-sided confidence intervals for {, in 
terms of order statistits. Thus (—co, %s)) and (x,,), co) are one-sided 
confidence intervals for {, with confidence coefficients given by the 
right-hand sides of (20.1) and (20.2), respectively. If s>r, we 
obtain, from (20.1) and (20.2), 


Pliny <i <aul=Z (")p—py, (20.8) 


and this provides a two-sided confidence interval for č, obtained 
from the order statistics žr) and x), with confidence coefficient 
given by the right-hand side. of (20.3). For the median, we have to 
take p=0°50 


Example 20.1 Let us obtain a confidence interval for the median 
for a sample of size 10 and based on Xa) and xig) 
We have, from (20.3), 


Pixa S ln S swl= X (P) (5) w 


AA -AA 
eJ 

Jë 
Hence (x, a)» (g)) is approximately a 90% confidence interval for {1 


~0-90. 


_NON-PABAMETRIO METHODS s 583 


20.3 Tolerance interval 

Let L, and L, be two functions of sample values with Lı < Ly, 
such that the random interval (Z,, L) has a probability B of 
containing at least 100y% of the population ; that is to say; 


La 
Pl | fedr > 7] =PLRs)—File) >A=B = ROA) 

ty 
This interval (Lı, Lz) is called a 100% tolerance interval with 
probability 8. The functions L, and L, are called the lower and 
upper tolerance limits. For general statistics L, and Ly, the probability 
B in (20.4) depends on the form of the distribution F(x). Hence 
generally tolerance intervals are not distribution-free. However, if 
each of L, and L, is one of the order statistics, then the probability 
B does not depend on the form of F(x); i.e., tolerance intervals 
based on order statistics are distribution-free tolerance intervals. 


The equation 


s-1=1 In\ ; i 
PEF) -F= 2, G0 


determines f as a function of r, s, n and y only, with L,=%(r) and 
L= tuy 

In mass production of articles, a certain amount of variation 
from the aimed-at value is inevitable. After production has started, 
statisticians often calculate tolerance intervals, on the basis of a 
sample drawn from the production line, which cover with given 
probability (8) a certain fraction (y) of the items being produced. 


20.4 Non-parametric tests for location 

We shall now consider some non-parametric tes 
parameter (median) of a population or for comparing the location 
parameters of two populations. In these tests, we do not make the 
usual normality assumption(s) about the parent distribution(s). 


ts for the location 


One-sample sign test 
Consider a distribution which is continuous in the vicinity of 


the median 9, i.e. is such that P[x< 8]=P[x > 0j=0:5. Let xy Xa 
hvssns , x, be a random sample from this distribution. 


584 FUNDAMENTALS GF STATISTIOS 


Suppose we are required to test the hypothesis 
H : @=6, (some specified value) 
against a one-sided (H: 0< or H: 8:>8,) or two-sided 
(H : 048,) alternative. 

If the sample comes from the distribution with median 6), then 
on the average half of the observations will be greater than 0, and 
half smaller. We replace each observation greater than 0, by a 
plus sign and each observation less than 0, by a mihus sign. Sample 
values equal to 6 may be ignored, since they have zero probability 
due to continuity of the distribution at the median. We count the 
number of plus signs (r) and the number of minus signs (s), with 
r+s<n. The distribution of r, given r+s, is binomial with 
f=P[x >]. The number r of plus signs may be used to test Hy, 
which is equivalent to testing for the binomial parameter, i.e. 
for Hy: p=t. i 

One-tailed cumulative binomial probabilities are given in 
Table VI of Appendix B. The use of the test is illustrated in the 
following example. 

Example 20.2 Suppose that we want to test the hypothesis 
that the median length (0) of ear-head of a variety of wheat is 
4,=9-9 cm. against the alternative that @49°9 cm., with «=0-05, 
on the basis of the following 20 ear-head measurements : 


93 8:8, 10°7,, 11:5, 8-2, 9-7, 10:3; 8:6, 11-3, 10-7 


ERN dale en wisitustuas soca he ch 
11-2, 90, 9:8, 93 9-9, 10-3, 10-0, 10:1, 9:6, 104 
ATE a ONS) ECA me 


We first determine the signs, which are posted below the measure- 
ments. We find 9 minus and 10 plus signs and one measurement 
equal to §,=9-9. So we are to test whether r=10 (plus signs) 
support the hypothesis H, : #,=9-9 or, equivalently, to judge how 
likely are 10 successes (the number of plus signs) to occur in 19 
trials from a binomial distribution with p—0-5. The critical region 
for the level æ two-sided (equal-tailed) test is given by 


T>Ta and r<rijg, 


where r is the number of plus signs and r,,, is the smallest and 


NON-PARAMETRIO METHODS 585 


ry ;y the largest integer such that 


3, (hb) <e 


"ald n 

= È (o) <2 

From Table VI in Appendix B, wc find that Toe =15 and 
f'on =4 for n=19 and p=05. Since for this example r=10, the 
null hypothesis is to be accepted, 

In the above example, the critical region for the one-sided test 
against the alternative H: 6>9-9 cm. or H: p >0°5 will be given 
by r >> ra, where ra is the smallest integer such that 


zg <0 
since under the alternative the sample will have an excess of plus 
signs. 
In the case of the other type of one-sided alternative, viz. 
H:0<9:9cm. or H:p<05, 


the level æ critical region will be r<r, where r, is the largest 


integer such that 
E (1) () <e 


If r+s>25, then the normal approximation to the binomial may 
be used to perform the test. In that case, the probability of r or 
fewer successes in (r-+s) trials will be approximately given under the 
null hypothesis by ®(7), where 

r—(r+s)/2_ 1-5 , 


One-sample Wilcoxon signed-rank test 
This test will be more efficient for testing for the median of a 
distribution than the sign test, provided the distribution is continuous 
and symmetric, since this takes into account both the signs and 
the magnitudes (ranks). r 
Consider a distribution whiċh is continuous and symmetric. Let 


gs gy eese , x, be n random and independent observations from this 


586 FUNDAMENTALS OF STATISTIOS 


distribution. Suppose we are required to test the hypothesis about 
the median (6), 
H, : 6=6, 

Then, under Hy, the differences d,—x;—0, are independent and come 
from a continuous distribution which is symmetric about zero. So 
Positive and negative d; of the same absolute value have equal 
probabilities to occur, Rank the absolute differences, |d|, from 
the smallest to the largest. Ignore 0’s as they are assumed to 
occur with zero probability. Tied ranks are given what would 
be the usual average value of the ranks if there had been no ties, 
Let T+ be the sum of ranks of the positive d; and T- that of the 


negative d; Then repr ame tt), m being the number of 
non-zero d; and m <S n. 

The null distributions of 7+ and T- are identical, each being 
symmetric about m(m+-1)/4 and ranging from 0 to m(m+1)/2. Also, 
fa ena So test statistics based on 7'+ and T- are 


related and provide equivalent test criteria. 


PUD c\0= 0.) =P[T+ ED eom] 


sprs efom]. 


It is more convenient to work with 7, the smaller of the two 
sums, and Table VII of Appendix B gives the left-hand critical 
values for the random variable 7’, which is either 7+ or T- (as the 
case may be). If 7, is such that P[7?<7,]=—a, the approximate 
critical regions for size-a tests of H, : 0=0, against different alter- 
natives are as given below : 


Alternative Critical region 

H: @>6, T-<?, 

H:0<6, bag ey oo 

H: 046, T+<T, or T-<T, 


For instance, in the case of the alternative H : >b, we should 
reject Hy : 0=0, if T+ is too large or, equivalently, 7- is too small. 
Hence the above test rules. 


NON-PARAMETEIO METHODS ` 587 


For n>25, T is approximately normally distributed under A, 

with E(P) =n(n+1)/4; 
varana laie fo 208) 

As already indicated, in practice, zero differences are ignored 
and n is adjusted to include only non-zero differences. 

Example 20.4 Consider the problem of Example 20.2. The signed 
deviations of the observations from 9-9 cm., with the ranks of their 
absolute values in parentheses, are : 

—0-6 (9-5), —1-1 (14),.0°8 (11-5), 1°6 (18), —1-7 (19), —0-2 (3:5), 
0-4 (6.5), —1:3 (15:5), 1:4 (17), 0-8 (11-5), 1-3 (15:5), —0-9 (13), 
—0:1 (1:5), —0°6 (9-5), O (ignored), 0:4(6:5), 0-1(1°5), 0:2(3-5), 
—0°3 (5), 0-5(8). 

Here 7'+=99-5, T-=90'5, so that 7=90-5. 

From Table VII of Appendix B, for n=19 (number of non-zero 
deviations) and a=0-05 (for two-sided test), we have 7.,=46. Since 
T+ and T- are both greater than T., there is not sufficient evidence 
to reject Hy. 

In the case of the one-sided alternative H : 0<9-9 cm. (H : 0>9-9 
cm.), we shall compare 7'+=99:5 (7-=90'5) with the critical value 
T ,=46, at the level «=0-025, and arrive at the same conclusion 
that there is no ground for rejecting H (in favour of the appropriate 
one-sided alternative) since T>Ta. 

Now, we consider some non-parametric tests for comparing the 
location parameters of two distributions. First, we consider the sign 
test, which may be regarded as an alternative to the paired-t test. 


Paired-sample sign test ; | 
The single-sample sign test procedure can be applied to paired- 
sample data. We assume that we have’ random sample of n pairs 


CENAE TO ore s (Xas Yn)» 
giving the n differences 
dj=%;—Jv for i=l, ym capece sn. 


It is assumed that the distribution of differences is continuous 
in the vicinity of its median 0; ie, P[d<8]=P[d>0]=} and 
P{d>0,]=p, as before. 


588 FUNDAMENTALS OF STATISTIOS 


All the procedures of the one-sample sign test will remain valid 
for the paired-sample sign test, with d; playing the rôle of x; every- 
where in the former situation. 4 

Example 20.3 Consider the problem posed in Example 17 11. 

The gains in weight (d;) for the ten boys are : 

6,8, 1,93, 3) 1,.3,. —2;) 4,-—2.- 

The null hypothesis is H,:@=0 and the alternative is H : @>0, 
where @ is the median of the distribution of differences. There are 
8 plus signs among 10 non-zero values. On the null hypothesis, 
the expected number of plus signs among the differences in a sample 
of 10 pairs is 5. The sampling distribution of the number of plus 
signs is the binomial distribution with probability of a plus sign 0-5. 
From Table VI of Appendix B, we find that the probability of 8 or 
more plus signs is 0-055. So the null hypothesis is accepted at the 
5% level. s 

For large sample size, say for n>25, the normal approximation 
to the binomial may be used. 


Paired-sample Wilcoxon signed-rank test 

Here also our problem is the same as in the paired-sample sign test. 

This test for paired observations is more powerful than the 
sign test for paired observations since the former takes account of the 
magnitude of the difference between the members of a pair. 

The observed differences d;=x;—y;— 6. are ranked in increasing 
order of absolute magnitude and then the ranks are given the signs 
of the corresponding differences. 

We assume that the differences dy dp «..:-- ,d, are mutually 
independent and that all d; come from continuous distributions (not 
necessarily the same) that are symmetric about zero. 

If the null hypothesis were true, then we would expect the sum 
of the positive ranks to be approximately equal to the absolute value 
of the sum of the negative ranks. 

All the procedures of the one-sample signed rank test will remain 
valid for the paired-sample signed rank test, with d;=x,—y,—0,. 

In both these tests for paired samples, we test for the median 
of the distribution of differences, which is not necessarily the same as 
the difference of the two medians. 


NON-PARAMETRIO METHODS 589 


Example 20.4 Consider the problem of Example 20.3. The 
differences and signed ranks are as follows : 


et Sage se ee E 


e EE chet 


di 


Signed rank -35 8 —35 


a el Pi ah Se 
Here T+=48 and T-=7; here T= will be used. In Table VII 

of Appendix B, we have, for n=10 and a=0°01 (one-sided), 1, =5: 

Since T->T a, we may conclude that there is not sufficient evidence 

to reject the null hypothesis that there is no effect of diet in favour 

of the (one-sided) alternative hypothesis at the 1% level. 


Two-sample Wilcoxon rank-sum test 

The sign test and the signed-rank test are. applicable when the 
observations are paired. A useful test procedure for the case when 
the observations relate to two independent samples (not paired), for 
testing whether the location parameters (medians) are the same in the 
populations, i.e. for testing Hy : 9,:=4 is given by the rank-sum test 
of Wilcoxon or, equivalently, by the U-test of Mann and Whitney. 

For the Wilcoxon rank-sum test, we assume that there are two 
continuous distributions (one of x and the other of 7) which differ 
only in their locations (medians). The distribution of x has median 
6, and that of y has median 4. We set 4=0,—0,, and this, known 


as the shift, is the parameter of interest. 3 
We obtain N=n+ n: mutually independent observations, say 
from the two distributions. 

1 


and yp Jo vee Ing 
The appropriate nuli hypothesis in this case is that the two 
distributions are identical and the alternative is that the y distribu- 
tion is shifted to the right (oF left) or that the distributions differ 
only in location. In terms of A, the problem becomes that of 
testing H, : 4=0 versus the alternative H:4>0 (or H:4<0) or 
H: 440. 
We order the N=m +n: observations from least to greatest and 
rank them from | to N. If there are ties, we use average ranks. 


We denote the sum of ranks assigned to the ys by W, which isthe 
Wilcoxon rank-sum statistic. If the two distributions are identical, 


590 FUNDAMENTALS OF STATISTIOS 


ie. if 4=0, then we expect the two samples will intermingle in a 
regular way. On the other hand, if 4 > 0, most of the higher ranks 
will be occupied by the y observations and hence W will be large. 
Similarly, small values of W will indicate that A <0, whereas too 
small or too large values of W will be more probable if 440. 

Since the sum of ranks of all V observations is V (N+1)/2, the sum 
of ranks assigned to the x’s will be W'=N(N+1)/2—W. When 
H: 4=0 is true, the distribution of W is symmetric about its 
mean, n(NV-+-1)/2. : 

For a one-sided test of H, : A=0 against H : 4 > Oat the levela, we 
reject Hp if W > ws, ngng where wa, nang is the upper 1004% 
point of the null distribution of W with sample sizes n, and ng. 
Values of wa ; n,» my ate given in [2]. We reject H, at the level œ in 


’ favour of the alternative H:4<0 if W <n,(NV+1)—o, n 


Toge ae 
For a two-sided test, we reject H, at the level « if 
W> Ma, 3M ir Me or Ws ty(N+1)—a, my Ms, 
with «,+o,=a. 
For large samples, we define, as our test statistic, 
wx —W=m(N+1)/2 (20.6) 


Vm (ND 
which has, under Ho, an asymptotic N(0, 1) distribution as 
min (m, n)-00, and perform a large-sample normal deviate test. 

To test H, : A=4p, sume specified non-zero number, we compute 
W using x; and _y;’—y;—A, (instead of Jj), and then the previous 
procedures remain valid. À 
Two-sample Mann-Whitney U-test 
Mann and Whitney proposed a test with the same assumptions 
as those of Wilcoxon. The test statistics are different, but the tests 
are equivalent. Wilcoxon’s test uses ranks while the Mann-Whitney 
test makes use of a count. 


For testing Hy : A=0, Mann and Whitney defined the statistic 
as a | 
> Ji)» t 7 
U=, Ze) (20.7) 


where $in yj) = { . cee <J 


NON-PARAMETEIO METHODS 591 


U is the number of times an x; precedes a y; among all (xi _¥;) 
pairs. Mann and Whitney showed that in case there are no ties, 
W—U-+n,(n,+1)/2, and this implies that the two tests are 
equivalent. 

If we denote by U’ the number of (xi 94) pairs for which x;> yj, 
then assuming no x=y ties, we have U+U' =nyng: 

When H : A > 0 is true, U will tend to be larger than U’. Hence 
we reject Hy: 4=0 in favour of H: A> 0 for large values of U or, 
equivalently, for large values of W or for small values of U’. 

Similarly, for testing Hy : A=O0 against H: 4 <0, we reject Ho 
for small values of U, while for the two-sided alternative we reject 
H, for small values of min (U, U'). The selection of U, U’ or 
min (U, U’) depends on the type of alternative. Ifthe computed 
value of the appropriate U is less than or equal to the tabulated value, 
then we reject the null hypothesis at the stated level of significance. 
We can use Table VIII of Appendix B for n, (size of larger sample) 
between 9 and 20 and n <20. For small values of ny ng (none 
larger than 8), Mann and Whitney have given a.table of exact 
probabilities. 

For large samples, we define, as our test statistic, 

sL Umm vee (20. 
z Jan NFI. ce 
-which has an asymptotic (0, 1) distribution as min (nm, ng) >00, 
and perform a large-sample normal deviate test. 

Example 20.5 The following are the marks secured by two batches 
of salesmen in the final test taken after completion of training. 
Use the U-test with «=0°02 for the null hypothesis that the samples 
are drawn from identical distributions against the alternative that 
the distributions differ in location only. 

Batch A: 26, 27, 31, 26, 19, 21, 20, 25, 30 ; 
Batch B: 23, 28, 26, 24, 22, 19. 


n,=6, n= Neon +tn=15 
and W=77°5, U=31, U'=20. 


From Table VIII of Appendix B, we find that for (size of larger 
sample) ny=9 and (size of smaller sample) n,=6 for a two-tail test 


| 


Here 


592 FUNDAMENTALS OF STATISTICS 


at the level 0-02, the critical value is 7. Since 20 (the smaller of U 
and U’ for this problem) is greater than 7, we have no reason to 
believe that the samples are not drawn from identical distributions. 


Two-sample median test 

Like the U-test, the median test* is also meant for the null hypo- 
thesis that two independent samples are from identical distributions 
against the alternative that they have different location parameters 
(medians). The alternative may also be one-sided (meaning that 
the median of one distribution is greater than that of the other). 
This test is sensitive to differences in location. The test may be 
used with data presented in at least an ordinal scale. 

Let there be two samples of sizes nyọ and ng. from the two distribu- 
tions under comparison. We order the W=nyy+-g observations in 
the two samples combined and determine the combined sample 
median Ô. Then we count the number of observations in each of the 
two samples that lie below ĝ and the number of those which do not 
lie below ĝ. These can be put in a 2 x2 contingency table : 


Observations 


z Total 
<ô | >ô jè 
Sample 1 mi | a nw 
Sample 2 A | nys nyo 
PEE E T 
Total no | nos N 


——— ht — 


If'fiy) and na are small, one may obtain the exact probability of 
the above 2x2 table with fixed marginals by equation (19.3), viz. 


„Mho! Ryo! tgs! rigs! 
PS Mileage HV (20.8) 


whereas for moderately large nj, fyg (Say each greater than 10), we 
may use the (frequency) x* statistic with 1 degree of freedom, where 


2 (Paneg — Mann)? NV 
‘ nein vs (20,10) 


{vide formula (19.38)}. 
* Seme also call this the sign test for two independent samples. 


NON-PARAMETRIO METHODS -~ 598 


Example 20.6 Perform, the median test for the data shown in 
Example 20.5. i EEIEIE TS 

In this case, the sample median (combined)=25, and we have 
the following table ; ; , i 


Me 


Batch A 
Batch B 


Total hia | p30 a 


The exact probability of getting such a table, by (20.9), is 
0:1958. The probability of getting less likely distributions (in one 
direction) than the “observed ` one will be obtained from’ the 
following two tables : ame 


>25 | Total 


and 


~ Probability =0'0336 Probability —0-00!4 

The exact probability of getting the observed table or more 
extreme tables in one direction is, therefore, - 

0-1958-+0°0336 4+ 0:0014 =0°2308. 
So the exact probability of getting the observed distribution or less 
likely ones in either direction is, by symmetry, 
2x 0-2308=0°4616. 

As 0:4616>0-05, we accept the null hypothesis that the two distri- 
butions are identical against the two-sided alternative, Also the 
observed table would suggest acceptance of the null hypothesis 
against a one-sided alternative since 0:2308>0-05. ‘ 


xs (1)—38 


594 FUNDAMENTALS OF STATISTICS 


The chief objection to the exact method is the computational 
labour involved. But in cases where the probability of the observed 
table itself exceeds the level of significance (as in the present case, 
where it is 0-1958), we need not obtain the probabilities of more 
extreme cases, since the sum of probabilities of the observed table 
and the more extreme cases will also exceed the level of significance. 


20.5 Two-sample non-parametric tests for dispersion 

REE is des nepice yy, AN yy, Jas eeto »Jn, be two samples of inde- 
pendent observations drawn from two populations with distribution 
functions F and G, respectively. We assume F and G are continuous, 
have the same median or the difference of the two medians is 
known, so that the data can be adjusted to have common population 
medians. Thus F and G differ only in the scale parameter. 

We are interested in the null hypothesis H, : F(x)=G(x), for all x, 
against the alternative H : F(x)=G(0x), for all x and some 6>0 
but 641. This is called the scale alternative. The x population is 
more (less) dispersed than the y population if @<1 (@>1). 


Mood’s rank test for dispersion 

In this test, we make the assumption that F and G have equal 
medians (which may or may not be known). First, we rank the 
N=n,+n, observations from the two samples and denote by r; the 
rank of y, in this combined ranking. Then the Mood test is 
performed using the following statistic based on the sum of squares 
of the deviations of r; ( y ranks) from the average combined rank, 


(N+1)/2: 
My= È (AE F ve (20.11) 


If the x’s are more (less) dispersed than the »’s, ic. if H: 0<1 
(H: 0>1), then My will be relatively small (large), For the two- 
sided alternative H : 641, on the other hand, we reject Hy if My is 
either too large or too small. 
Under Ho, 
E(My)=n,(N?—1)/12 } 


and var(My) =nyn,(N-+ 1) (N#—4)/180. (20.12) 


_— » 


NON-PARAMEYRIO METHODS 595 


The exact null distribution of My can be derived in small 
samples by enumeration. The null distribution of My is symmetric 
for n=n, For large samples, the normal approximation with the 
mean and variance given in (20.12) can be used to get the critical 


values. 


Sukhatme’s lest for dispersion 
This test can be used only with a knowledge of both the 


population medians, Even knowledge of the difference of medians 
is not enough. For, in deriving the test statistic, we are to adjust the 
observations (x’s and »’s) so that both the populations have zero 
medians. Mood’s test has an advantage in this regard. 

We may, however, in large samples adjust the observations by 
subtracting the respective sample medians from the observations. 
As sample medians converge to the population medians, this will 
not introduce much error in large-sample tests. However, the exact 
distribution of the test statistic will change. 

As this test is used only when the populations do have zero 
medians, the test statistic is defined as 


j= S (4 95) | w+ | (20.13) 
My Ny i21 fat i 
5 either O<x< y 
whers Hare) if { or y<x<0 


=0 otherwise. 
The mean and variance of T under Hy : 6=1 are 
E(P)=1/4, 
var (T) =(N+7)/48nyn9- } 
The null distribution of 7 may be found by enumeration and an 
exact test performed. If ys are more (less) dispersed than x’s, then 


T will be relatively large (small). For the two-sided alternative 
(041), we reject Hj: @=1 in the case of too large or too small 


(20.14) 


values of T. 
For large samples, an approximate normal deviate test can be 


performed using the parameters (20.14). 


596 BUNDAMENTALS OF STATISTIOS 


20.6. A general non-parametric test for two independent 
samples 

In the above two sections, we have considered two-sample non- 
parametric tests for special alternatives, e.g, that the two distributions 
differ in respect of location only or dispersion only, In this section, 
we consider a general two-sample test for testing the null hypothesis 
that two independent samples come from identical distributions 
against the alternative that the two distributions differ in any manner 
whatsoever—in location, in dispersion, in skewness, in kurtosis or in 
any other respect. Another test of this type, the Kolmogorov-Smirnov 
two-sample test, will be considered in the next section, 

If we want to test whether the two distributions differ only in one 
particular respect, say either in location or in dispersion, then we 
should use a test for location or dispersion. The test being considered 
here will be less powerful in disclosing differences of a particular 
kind, for it is a test for any sort of difference and not for a particular 
type of difference. 


Wald-Wolfowittz run test 


When we wish to test whether two independent samples have been 
drawn from the same distribution against the alternative that the 
two distributions differ (in any manner), this’ test is used. The test 
assumes that the underlying population distribution is continuous. 

Let us draw a random sample (observations denoted by x’s) of 
size n, from the first distribution and a random sample (observations 
denoted by »’s) of size n, from the second distribution. We then 
arrange the V=n,-++-n, observations from the two samples combined 
in order of magnitude. Thus we might have the arrangement : 


ZEREA EHe 
A run is a sequence of identical letters preceded and followed by a 
different letter or by no letter at all. Thus, in the above sequence, 
we have a run of one y followed by a run of three x’s ; this in turn is 
followed by a run of two y’s ; and so on. Let r be the total number 
of runs in the group of N observations. Then if the two samples are 
from the same distribution, the two samples are expected to be 
thoroughly mixed and r is expected to be large; whereas r is 


NON-PABAMETRIO METHODS ~ 597 


expected to be relatively small if the distributions are not the same. 
Hence we reject Hy if r is too small. 
To obtain’ the sampling distribution of r, we observe that there 


are (~) = E ) different possible arrangements of the n x’s and the 
ny ng A 


n, y’s in-a line, and all these arrangements are equally likely under 
the null hypothesis. Next we find the number of arrangements of 
m x’s and nq 9’s giving a total of r runs. Let r=2d (even), then we 
must have d runs of x’s and d runs of y’s. To get d runs of ’s we are 
to divide the n x’s in d groups and find all ordered d-part partitions 
of n things. This is obtained (with the help of generating functions) 
as follows; The required number of d-part partitions of n, things is 
the coefficient of ¢"! in 
UHHH j= (1—0)! 
2 jd—\+i\,; 
=U, d—1\ ): 


and is cat ) . Ina similar way, the number of d-part partitions of 


ng 9's is ity Hence the total number of ways of getting r=2d 


d—1 
combined in two ways to give r=2d. 
ny i) ee 


So Pfr=2d]=2 \d—1} \d—=1 cus" (20.15) 


E 
By a similar argument, we have, for r odd, 
ma y(n) + (1) (a 
snes cae | ' | Ca aia! wee 
(a) 


To perform the test of the null hypothesis at the level a, we 
find r, (as large as possible) such that 
P(r<1o)=% 
and reject Hy if the observed r does not exceed rg. 


runs is 2 aty mg—1) since d runs of x’s and d runs of y's can be 


Pir=2d+1]= (20.16) 


598 FUNDAMENTALS OF STATISTICS 


Tables of critical values of r, based on (20.15) and (20.16), are 
given by Swed and Eisenhart. Any value of r which is equal,to or 
smaller than that shown in Table IX of Appendix B, is significant 
at the 0:05 level. 

For large values of n, and n, the sampling distribution 
of r is approximately normal with mean and variance given, 


under Hy, by 
El) 7H 


2nyng(2nyne—N ) i 

N#(N—1) 

An approximate test can be performed by using the above approxi- 
mation if (a) n, and ny are both larger than 10, or (b) either n, or ng 
is larger than 20, 

Since the underlying distribution is assumed to be continuous, no 
ties should occur. But in practice, due to measurement approxima- 
tions, ties may occur. If ties are within the same sample, then there 
is no problem as the number of runs is not affected. But if there 
be ties among observations from both the samples, we cannot get 
a unique value of r. In that case we break ties in all possible 
ways and find the corresponding r’s. If all these different r’s 
lead to the same conclusion at the selected level a, then there is 
no problem. 

There is, however, some difficulty when the different r’s asso- 
ciated with different ways of breaking ties lead to different decisions. 
Indeed, if the number of ties between observations from the two 
samples is large, then the run test is not to be recommended. 


(20.17) 
and var(r) = 


Example 20.7 Scores on a clerical aptitude test administered to a 
batch of 6 Secretariat and 7 Directorate clerks are given below. Test 
whether the two groups of clerks have the same score distribution in 
the e Peres 


Scores for Secretariat clerks 4” 35 52. 60 46 55 


NON-PARAMETRIO METHODS 599 


We arrange the scores from the two groups in order of magnitude, 
noting the groups to which each score belongs : 


35 40 42 46 47 50 52 55 56 57 S7 60 62 
s soud isi ee es ee E 
Thus we have m=6, n =7, N=6+7=13and 4 runs of §’s and 4 runs 
of D’s, giving r=8. The critical value of y at the 5% level, from 
Table IX of Appendix B, is 3. Since the observed value (8) is greater 
than the critical value (3), we accept the null hypothesis (that the 
population score distributions are identical) at the 5% level. 


20.7 Koimogorov-Smirnov tests 

We should consider here two test procedures that are comparable 
to the x?-test ior goodness of fit and the x®-test for homogeneity of 
two distributions and have the merit that they can be applied even 
to small samples. 

The tests, due to the Russian statisticians Kolmogorov and 
Smirnov, are based on the notion of the empirical distribution function 


of a continuously distributed random variable. Let the observations 
in a random sample of size n from the distribution of the random 
variable x be arranged in increasing order of magnitude and let xyi 
be the ith value in this arrangement. Now, suppose the function F, 


is defined by 
F,(x)=0 ifs <a 


=iln if xu S t < tur» for i=l, 2, ee „n=l 
=l if 2 xy 
Thus nF, (x) is nothing but the number of sample observations hae 
are less than or equal tox. This function F, is called the empirical 
distribution function of x. It may be noted that if F be the distribution 


function of x, then 


PLF, (x) =k/n] = (a (F(x)! Fe)" k=, | De 
(27.18) 


so that, for fixed x, nF,(*) is a binomially distributed randəm 
This implies that 


variable with parameters n and F(x). 
E| F(x) J=F (4) 
and vat[ Fy, (=F) —F(x)]/n- 


600 FUNDAMENTALS OF STATISTIOS 


Tt may alto be noted that, for fixed x, F,(*) isan unbiassed and 
consistent estimator of F(x). 


One-sample goodness of fit test 

Suppose we are required to test the hypothesis that x, which is 
assumed to be continuously distributed, has a completely specified 
distribution function, say Fy. Thus our problem is to'tést the hypo- 
thesis Hy : F(x) =r (x); for each x, against the two-sided alternative. 
An appropriate test criterion for this hypothesis is 


Dy= max |F,(x)—Fo(x)|. 1. (20.19) 


Because of the results stated earlier for the empirical distribution 
function, in case Fy represents the true distribution of x,one expects 
D, to be small, while a large value of D,, may be taken to indicate 
that the true distribution function of x is not Fy. In other words, 
one will reject H, if and only if the observed value. of D, for the 
given sample exceeds the upper a-point of D,. The critical values 
of D,, for «—0-01 and 0-05, will be found in Table 10.1 of Formule 
and Tables for Statistical Work. 


Two-sample test > 

` Suppose a random variable is continuously distributed in each 
of two populations, the distribution functions being denoted by F 
and G. Suppose further that independent random samples, say 


and Mis Var tosenn a Bes 

have been drawn from the two distributions. We shall denote by Fm 
and G, the empirical distribution functions of the x-sample and the 
y-sample, respectively. . 

Here our problem may be to test the hypothesis that the two 
distributions are identical, i.e. the hypothesis Hy: F(t)=G(t), for 
all ¢, against the two-sided alternative. Then an appropriate test 
criterion will be the Kolmogoroy-Smirnovy statistic 


Dni= max |Fa(t)—G, (Q) |. tee (20.20) 


If the hypothesis is true, one expects the value of Dy» to be small, 
while a large value of Da, may be taken as an indication that the 


NON-PABAMETEIO METHODS 601 


parent distributions are not identical. The upper a-points (a=001 

and 0:05) of this statistic are given in Table 10.2 of Formule and 

Tables for Statistical Work for the casé m=n. More extensive are 

the values presented in Table 55 of Biometrika Tables, Vol. 2. One - 
will reject Hp, at the chosen level of significance, if and only if 
the observed value of Da, exceeds the corresponding tabul: ted 

value. i 

A comparison between the Kolmogorov-Smirnov tests and the 
corresponding x?-tests is in order. Both types of test are distribution- 
free, in the sense that the sampling distribution of the test statistic 
does not’ depend on the true distribution (or distributions) of the 
variable. But, unlike the x®-tests, the K-S tests may be used only 
when the population distributions are continuous. Also, while the 
x?-test for goodness of fit may be used even when the hypothesis 
specifies the form of the population distribution only, the corres- 
ponding K-S test requires that the hypothesis specify the distribu- 
tion completely. However, the K-S tests may be applied to samples 
too small to justify the x*-tests. They involve less computation, 
Further, these tests are generally more sensitive (ie. more powerful) 
than the x?-tests. 

Example 20.8 Ten points are taken in an interval of length one 
metre. The distance of each point from the start of the interval 
is (in metres) as follows : 

0414 0523 0229-0942 0:097 
0-394 0:572 0-486 0273 0-358 

‘Lhe points may be supposed to be chosen at random and 
independently of each other if, and only if, the 10 observations form 
a random sample from the ractangular distribution over the interval 
[0, 1]. To examine whether this is borne out by the data, we shall 


use the (one-sample) K-S test. 
We note that: the distribution function F, of the rectangular 


distribution is 


= 


Fo(x)=09 ifx<0 
=x if0<*<1 
A =] “ifx>1. 


602 FUNDAMENTALS OF STATISTIOS 


On the. other hand, the empirical distribution function F, is as 
shown below + 


Interval of x values | F,(x) Interval of x values F,(x) 
x< 0097 0 0'414 < x < 0-486 06 
0097 < x < 0'22 O1 0:486 < x < 0523 0-7 
0:229 < x < 0-273 0'2 0'523 <x < 0572 08 
0273 < x < 0-358 03 0:572 < x < 0942 09 
0'358 < x < 0:394 0-4 x > 0942 1-0 
0394 < x < 0414 05 
Hence D,=max|F, (x) —Fo(x) | 
x 


has the value 0-328 (corresponding to x=0°572). 

From Table 10.1 of Formule and Tables for Statistical Work, we find 
that for n=10, the critical value at the 5% level of D, is 0:409. 
Since the observed value of D, is smaller, we accept the hypothesis, 
In other words, the observations may be supposed to be chosen at 
random and independently of each other. 


20.8 One-sample run test for randomness 

In order to arrive at some conclusion about a distribution on the 
basis of a sample, we need a random sample. To test the hypothesis 
that a sample is random, we use the order in which the observations 
were originally obtained, The technique to be used here is based 
on the theory of runs, For example, in sampling inspection, we have 
runs of defective and those of non-defective items ; in tossing a coin, 
we have runs of heads and those of tails, The total number of runs 
in a sample of a given size indicates whether the sample is random 
or not. Too few or too many runs may indicate a time trend or 
systematic short-term cyclical fluctuations in the observations. 

We find the number of runs (r) in the group of n observations in 
the sample. The observations may be heads or tails in a coin-tossing 
experiment, the good or bad items in a sampling inspection of items 
from a lot or the measurements below the sample median (—) and 
the measurements above the sample median (+). 


NON-PARAMETRIO METHODS 603 


The sampling distribution of r arising from random sampling 
is known (vide Section 20.6). Using this, we decide whether a given 
sample -has more: or fewer runs than expeeted under random 
sampling. 

Example 20.9 The following data were obtained on the sexes of 
the 15 students standing in the queue in front of the admission 
counter of a university. "Ascertain whether the arrangement of sexes 
was a random one or not. - 

Order of 15 males (m) and females (f) in the queue : 

nf mmm ffmnmm fm ff 
We have (using the notation of Section 20.6) m=9, m=6, 
N=9-+6=15 and 4 runs of m’s and 4 runs of f’s, giving r=4+4=8. 
The critical value of r at the 5% level, from Table IX of Appendix B, 
is 4. Since the observed value is greater than the critical value, we 
accept the null hypothesis that in the queue the order of the sexes 
was a random one. 


20.9 A non-parametric measure and tests of association 

Spearman’s rank correlation coefficient (vide Section 14.2) is a 
measure of association based on ranks. It can be used to test 
whether two variables, say « and y, are independent. We need make 
no assumption about the joint distribution of x and y. Like the 
product moment correlation coefficient, its value also ranges from 
—1 to 1, A value of +1 indicates perfect agreement and a value of 
—1 indicates perfect disagreement between the two series of ranks. 

The formula for calculating rg when there is no tie is given in 
(14.1), while that for tied ranks is given in (14.3). Under the 
assumption that the individuals, who were ranked, were randomly 
drawn from some population, we can test the null hypothesis that 
the two variables x and y are not associated in the population, If 
the null hypothesis is true, then for a given rank order of the y 
values (x values) all possible rank orders of the x values (y values) 
are equally likely to occur. For n individuals, there are n! possible 
rankings of x values, and since they are equally likely, the probabi- 
lity of the occurrence of any particular ranking of the x values with 
a given ranking of the y values is 1/n!. 


604 FUNDAMENTALS OF STATISTIOS 


The probability of the occurrence of any given value of rz, under 
Hha, is proportional to the number of permutations of x values giving 
rise to that value of rg, since to each ranking of y values there 
corresponds a value of rp. 

For n=2, only two values of rp are possible, viz. +l and — 1, 
and each has the probability of occurrence } under Hj. For n=3, 
the possible values of rp are —1, —}, $ and +1, with respective 
probabilities of occurrence {,4,} and 4 under Hy, Table X of 
Appendix B gives critical values of rz for n from 4 to 30, which have 
been arrived at by a similar method. If an observed value of rp 
equals or exceeds the tabulated value, then that value of rp is 
significant (for a one-sided test) at the stated level. 

The table may be used for two-sided tests also. In this case, 
the hypothesis of independence is rejected whenever |rg| is greater 
than the tabulated value. The associated level of significance is 
then double the value given in the table. 

Kendall has shown that when n is 10 or larger, the: statistic 

t=rgVn—2|V1—r_2 wee (20.21) 
may be taken to be distributed as a Student’s ¢ with df=n—2. 

Example 20.10 For the data of Example 14.1, let us find out 
whether the rankings by the two judges are independent or not. 

We have already seen in Example 14.1 that rg=0'394, with n= 10. 
By referring to Table X of Appendix B, we find that the 5% value 
of rz for n=10 is 0:564. So we accept the hypothesis of indepen- 
dence in ranking at the 5% level of significance. 

Example 20.11 Consider the data of Example 14.2 and test the 
hy pothesis of independence of the rankings by the supervisors. 

Here, as already obtained in Example 14.2, rg=0 956 with n=12. 
Since n>10, we may use here the large-sample test given by 
(20.21) : 

t=0-956-/ 10/1 — (0-956)? 


j 10 
=0'9564/1—5- 913056 
=0-956/ V 00086064 


=0:956/0:09277=10:30, with df=10. 


NON-PARAMETBIO METHODS 605 


This is significant at the 5% level. The same result is obtained 
by performing the exact test and using Table X of Appendix B. 


A quick test to detect correlation 

This test was developed by Olmstead and Tukey to detect 
correlation between: two random variables x andy. They called it 
both a “corner test of association” and a “quadrant sum test”. 
This test places stress on very large and very small observations to 
detect correlation. This is a quick test to apply. 

The data consist of n independent observations (xy; J1)» (*2» Ve)» 
m... (Xn Jn) from a bivariate distribution, The test assumes that 
the random variables x and y are continuously distributed. We 
test whether the two variables x and y are independent against the 
alternative that they are correlated. 

To perform the test, we first make a scatter diagram of the n 
points (x, y) on the (x, y) plane as in Fig. 12.3 and draw a 
horizontal line through the median of y values: y=Ymear and draw a 
vertical line through the median of x values, x=2meqs âS illustrated 
in the example below. - The first and third quadrants are marked + 
and the other two are marked — 

We next make some counts as follows: (1) Beginning at the 
right-hand side of the scatter diagram, we count in decreasing order 
of abscissæ the observations until forced to cross the line y=Jmea- 
Attach a + sign (— sign) if those observations lie in the + quadrant 
(— quadrant). (2) Next, we start from the left-hand side of the 
scatter diagram and count in increasing order of abscisse the obser- 
vations until forced to cross the line y=y,.q and attach a sign accor- 
ding to the rule stated above. (3) We repeat this process moving 
downward from above and count in decreasing order of ordinate 
the observations until forced to cross the line x=Xmea and attach 
appropriate signs. (4) Next, we move up from below and count in 
increasing order of ordinates the observations until forced to cross 
the line x=%peq and attach appropriate signs. It is recommended 
that the last point in a count be ignored if the point is tied with 
some point(s) on the other side of the median or lies exactly on the 
median line. 

We form the quadrant sum which equals the sum of the above - 


606 FUNDAMENTALS OF STATISTIOS 


four counts with signs attached. The Olmstead-Tukey test statistic 
T equals the absolute value of quadrant sum : 
T =|quadrant sum|. ese (20.22) 

We reject Hp at the level æ if T exceeds the upper 100a% point as 
given by Olmstead and Tukey. For n>14, we may use as the critical 
values 10 (for a=0:05) and 13 (for «=0°01). 

Example 20.12 Consider the data of Example 12.1 and apply the 
corner test for association. 

The scatter diagram is drawn in Fig: 20.1 and the lines x=x mea 
and y=)jneq are also drawn. Here x,,.~=154 and »,,,,=360'5, We 
count the observations in the four directions (viz. right to left, left 


360 


< 
440 F 
: = |e 
at 3 | + 
E I 
400 | 3 oi 
l X ’ 
l 
a T 
© 


320 


MARKS IN UNIVERSITY EXAMINATION (Y) 


100 120 140 160 180 200 


MARKS IN COLLEGE TEST (X) 
Fig. 2U.1 Carrying out the corner test (data of Example 12.1), 


to right, top to bottom and bottom to top) following the stated 
rules and attach appropriate signs. These result in the quadrant 


-_ 


NON-PARAMETRIO METHODS 607 


sum=3+146+4+8=18. The test statistic is also 7 =|quadrant 
sum|=18, Here n=20. At the 1% level, the critical value of T 
(for n>14) is 13. Hence we conclude that the null hypothesis of 
independence of marks in college test and marks in university 
examination cannot be accepted at the 1% level. 

In the following table, we present the nature of the hypotheses 
considered by the non-parametric (or distribution-free) tests in this 
chapter together with their analogous parametric counterparts. 


ee 


Nature of Non-parametric Analogous 
hypothesis test(s) parametric test(s) 
Ji te sg) pbs DS a E a E E 
Location parameter(s) 
One sample Sign ; Wilcoxon signed rank Normal ; Student's t 
Paired sample Sign ; Wilcoxon signed rank Normal; paired ¢ 
Two independent Wilcoxon ; Mann- Normal ; Fisher’s ¢ 
samples Whitney ; median 
Szale parameters 
Two independent Mood ; Sukbatme F test 
samples 
Association lest 
One sample Spearman’s rank correla- Pearson's product 
tion ; corner test moment correlation 
(Student's ¢) 
Goodness of fit test One-sample K-S test — 
Test for randomness 
One sample Ordinary run test ‘aa 
General one-sample test * Binomial test = 


General two-sample test © Wald-Wolfowitz run test ; - 
two-sample K-S test 


I 


Questions and exercises 


20.1 State clearly the difference between a parametric and a 
non-parametric problem. What is meant by the ‘robustness’ of a 


sta istical procedure ? 


608 FUNDAMENTALS OF STATISTIOS 


20.2 What are the differences between non-parametric, para- 
meter-free and distribution-free procedures ? 


20.3. Describe how one cau obtain a point estimate of a popula- 
tion quantile and its confidence interval with confidence coefficient 
l—a. 

20.4 What is meant by a 100%% tolerance interval with 


probability 8? What is a distribution-free tolerance interval and 
how can it be obtained ? 


20.5 Discuss the sign test for the location parameter of a popu- 
lation. Show how it can bz adapted to the case of paired samples. 
Which one of the two tests for paired samples, (i) the sign test and 
(ii) the Wilcoxon signed-rank test, is more powerful ? Explain why 
it is so. 

-~ 20.6 Describe the Mann-Whitney U-test for two independent 
samples. What is the relation between the Wilcoxon statistic W and 
‘the U statistic? Give the large-sample approximi tion to the U-test. 

20.7 Explain why no ties should theoretically occur in the 
discussion of a non-parametric test and yet why ties are found in 
practice. How are these dealt with ? 


20.8 Give some two-sample non-parametric tests for dispersion 
along with their large-sample approximations, 


20.9 Show that for a scale alternative F(x) =G(0x), if @<1, then 
F(x)<G(x) for x>0 and F(x)>G(x) for x<0. Similarly, show that 
if @>1, then F(x)>G(x) for x>0 and F(x) <G(x) for x<0. 


20.10 In Sukhatme’s test, along with 7’, define also 
P Spe 
T'=— 2 ? 
im 2 Bs $(*i.9;) 
where $(x, »)=1 if either 0< y<x or x< y<0 
=0 otherwise, 
Denote the number of x and y observations which are positive by 


m and n, respectively. Assume there is no case where xj=yj OF x;=0, 
j; =0. Then show that 


T +T' =mn+(n,—m) (ng—n). 


NON-PARAMETRIO METHODS 609 


20.11 Discuss the Wald-Wolfowitz run test. Show how the 
theory of runs may be used to test for the randomness of a sample. 


20.12 Explain the use of Spearman’s rank correlation coefficient 
in a test of association. Give the large-sample approximation to 
the test. 


20.13 Describe the ‘corner test’ for association. 


20.14 To determine the mileage of a type of truck, 6 trucks were 
run and the mileage of each obtained with a gallon of gasoline was 
as follows : 

21, 19, 22, 18, 20, 24. 
Use the sign test to examine whether the average number of miles 
run with a gallon of gasoline by trucks of this type is 20, the alter- 
native hypothesis being that it is greater than 20. 
Ans. H is accepted at the 5% level. 


20.15 Quantile test. Consider the data of Example 20.2 and test 
whether the 40th percentile of ear-head distribution is 9:1 cm. 
against the alternative that it is different from 9-1 cm. 

[Hints: Let r be the number of sample observations less than the 
specified p-quantile value, while s observations are greater than it. 
Then r has the binomial distribution with parameters n=r +s and p. 
The critical value is based on this binomial distribution. 

The sign test is a particular case of the quantile test with 
p=0°50.] Ans. His accepted at the 5% level. 


20.16 Binomial test. A particular brand of TV set uses picture 
tubes from suppliers if at least 85% of their tubes have a ‘life’ of at 
least 600 hours. In a random sample of 20 tubes of supplier A, 16 
survive this ‘life test’. Should the TV manufacturer purchase this 
brand A of tubes ? 

[In the case of n observations dichotomised into categories 
‘success’ and ‘failure’ (these terms are quite arbitrary), we can test 
for the relevant population parameter @ (probability of success) 
against one-sided or two-sided alternatives by considering the number 
of successes as a binomial variable with parameters (n, 8). In the 
problem r=16, n=20 and 6=0°85. As P[r < 16]9=0:85]=0-35, 
Hg : 0=0°85 is accepted. Here the alternative is H : 8 < 0°85.) 


ys (1)—39 


610 FUNDAMENTALS OF STATISTICS 


20.17 A manufacturer of electric bulbs claims that he has 
developed a new production process which will increase the mean 
efficiency (in suitable units) from the present value of 9-03. The 
results obtained from an experiment with 15 bulbs from the new 
Process are given below : 

9 29 9-76 8-93 
10 15 12 05 902 
8-69 1238 10-87 
11-25 908 10-00 
11-47 10-25 11-56 
Do we have reasons to believe that the efficiency has increased ? 
Ans. Ha is rejected at the 5% level. 

20.18 Below are given the marks obtained by a group of 
20 students in a subject ir a college test and in the subsequent 
public examination. Test at the 1% level whether the group has 
improved its mean performance from the college test to the public 
examination, by using (a) the sign test and (b) the Wilcoxon signed- 
rank test. l 


Marks obtained in Marks obtained in 
Serial No. TaN pile Serial No. ona ale 

i 183 133 u 123 126 
2 175 193 12 121 141 
3 134 170. 13 175 103 
4 170 164 14 133 126 
5 183 199 15 144 146 
6 167 160 16 109 155 
7 _ 120 168 17 165 162 
8 175 158 18 144 161 
9 126 162 19 164 182 
10 187 176 20 125 119 


Partial ans. (a) 11 plus signs ; accept H, ; 
(b) T=82, N=20 ; critical value is 43. 


NON-PARAMETRIO METHODS 611 


20.19 A firm is advertising that it has been successful in 
designing a new home automatic clothes washer which is more 
effective in removing dirt than the most popular washer now in 
use. And in support of its claim, it is also displaying the following 
data of the dirt removed (in suitable units) by the most popular 
washer and the new washer for 16 equally-sized and equally-soiled 
loads of clothes which were washed with the same soap and for the 
same length of time, 7 loads being washed by the popular washer : 


Dirt removed by 


Popular washer | New washer 


13 10 
10 11 
9 12 
12 13 
11 9 
10 11 
8 14 
12 

13 


Do you have reasons to believe that the firm’s claim is genuine ? 
(Use both the median test and the Mann-Whitney test.) 
Partial ans. (a) Exact probability for the observed 2x2 table (for 
median test) is 0:16 ; 
(b) U'=14, for N=16. 

20.20 Ina class of 12 newly admitted students, the arrangement 
of students with father living and those with father dead according 
to the order of admission is ; 

LLDLLLDDLDLIL, 
where L denotes ‘father living’ and D denotes ‘father dead’. 

Does the above arrangement throw any doubt on the randomness 
of arrangement of the two types of students ? 

Ans. r=7, n=8, n=4 ; Fh is accepted at the 5% level. 


612 FUNDAMENTALS OF STATISTIOS 


20.21 Using Spearman’s rank correlation, test for the association 
between the judgments of the two judges in Exercise 14.7. 
Partial ans. . Hg is rejected at the 5% level, but accepted 
at the 1% level. 


SUGGESTED READING 


[1] Auble, D. “Extended tables for the Mann-Whitney statistic”, 
Bulletin of the Institute of Educational Research at Indiana University, 
1, No. 2, 1953. 

[2] Hollander, M. and Wolfe, D.A. Nonparametric Statistical Methods 

* (Chs. 3, 4). John Wiley, 1973. 

[3] Gibbons, J. D. Nonparametric Methods for Quantitative Analysis 
(Chs. 3—5, 7,8). Holt, Rinehart and Winston, 1976. 

[4] Mann, H.B. and Whitney, D.R. “On a test of whether one of 
two random variables is stochastically larger than the other”, 
Ann. Math. Stat., 18, pp. 52-54, 1947. 

[5] Mood, A. M., Graybill, F. A. and Boes, D.C. Introduction to 
the Theory of Statistics (Ch. 11). McGraw-Hill, 1974, and 
Kogakusha. j 

[6] Olds, E. G. “The 5% significance levels for sums of squares 
of rank differences and a correction”, Ann. Math. Stat., 28, pp. 
117-118, 1949. 

[7] Olmstead, P. S. and Tukey, J. W. “A corner test for associa- 
tion”, Ann. Math. Stat., 18, pp. 495-513, 1947. 

[8] Siegel, S. Nonparametric Statistics for the Behavioral Sciences (Chs. 
4—6, 9). McGraw-Hill, 1956. 

[9] Sukhatme, B. V. “On some two sample non-parametric tests 
for variance”, Ann. Math. Stat., 28, pp. 188-194, 1957. 

T10) Swed, F. S. and Eisenhart, C. ‘Tables for testing randomness 
of grouping in a sequence of alternatives”, Ann. Math. Stat., 14, 
pp. 66-87, 1943. 

[11] Wilcoxon F. and Wilcox, R.A. Some Rapid Approximate Statistical 
Procedures, Lederle Laboratories, American Cyanamid Co., 
1964. 


A ELEMENTARY THEORY 
OF ERRORS 


Al Introduction 

The theory of errors was developed by Laplace, Gauss and others 
in the beginning of the 19th century. The Gaussian or the so-called 
normal distribution plays a very important réle in the theory. The 
method of maximum likelihood is used in developing the concept of 
the most probable value of any physica! quantity, obtained from 
repeated measurements. 

Let us consider repeated measurements of a physical quantity by 
means of an experimental process which is as uniform as possible. 
It is a matter of common experience that the measurements are not 
all identical, but fluctuate arourid the true (unknown) value of the 
quantity, The difference of each observed value from the unknown 
true value is called the experimental error or error of observation. These 
errors are caused by a large number of uncontrollable (known or 
unknown) factors. Since the errors of observation are uncontroli- 
able or random in nature, they are also called random or accidental 
errors. 

Mathematically, if x is the measurement of a physical quantity 
whose true value is p, then 


e=x— jp . (Al) 
is called thc error of measurement. 
If we take n measurements, denoted by tir S „žm then the 
errors of measurement are 
=i Ph i=l, 2, sss.. Me eee (A2) 


A2 Normal law of errors 
That the distribution of errors follows a normal law can be 
deduced from Gauss’s postulate of arithmetic mean, which states : 
«When any number of equally good direct measurements of an 
unknown magnitude is given, then the best estimate or the most 
probable value of p is their arithmetic mean.” The postulate of 
615 


616 FUNDAMENTALS OF STATISTICS 


arithmetic mean can, in its turn, be deduced from the following 
four elementary axioms : 


Axiom 1. The best estimate of p, say y/(x,, xg) ------ , Xa) isa simple 
function, possessing a single-valued continuous derivative everywhere. 


Axiom 2.. Wi(xys xgy e »%,) is a symmetric function of xy, x; 
EIE Eie 
Axiom 3- p(x ka oa , *,) is independent of the origin of 
measurement ; i.e., 
U(x; +h, tath, oo... X_ th) E E 0000+ > Xa) +h, for any h, 
Axiom 4. p(x Xg see » *,) is independent of the units of 
measurement ; i.e., 
P(g, Kxys 000.05 Ee sk (X95 Aistei s žr) for any k. 
From Axioms 1 and 4, we have 
Reis AO s ža) =plkei krg ssi s kän) 
te Oy 
=y(0, 0, ...... ’ 0) +7 A lore: Okap n OF Sy); 
0<0<1. 
Making k-+0, we get (0, 0, ...... , 0)=0, and hence dividing by 


k and making k—>0 again, we have 
(xy, Sp ar t= Z BE 


OXiN0s o, s 0) 
By Axiom 2, the constants pak must all be the same, 
Vilos Oy een 20) 
say all equal toc. We then have 
(X35. Zy evere ry e x 
By Axiom 3, 
Z(xi+A) =Z ath 


or c=-. 


Hence h(x, Xp sree , s)= ptn 
the arithmetic mean. 

Now, if ġ(e) be the probability density of the error e=x—p, 
then the joint probability density of the n independent observations 


ELEMENTARY THEORY OF ERRORS 617 


Xqy Xg etes X,_ is 
flars ži +200 , Xn) =0(e1)$ (ee) «+++ (En)+ 
According to Gauss’s postulate, this f attains its maximum when 
Pa a tts 


ie. when Za=0. 
We then have, differentiating In f with respect to p, 


ð de 
$ Ingla)x ano 


p'le) 4 Oy N hi 
or 2 Tie) <0 since Ju 1 for eac i 
aE ZF i) =0, 

=$ (4), 
where F(e) $e) 


Since, according to the postulate, this leads to the solution p=* Or 
Ye;=0, we have 
7 


SEF) +Ade=0, 
where À is a Lagrangian multiplier. 
It follows that 
F’(e)+A=0 
or Fle) +Ae+c=0, 
where c is a constant. Since ZFlea)=0 with Za=0, we shall have 
c=0. 
Thus 
¢'(e)__», 


$e) 
or Ing(e)=—}eatind, 


where In is the constant of integration. 


618 FUNDAMENTALS OF STATISIIOS 


Thus we have 
$(e)=A exp{—}e i „a (A3) 


` v 
Here À must be positive, for otherwise the integral f ġ(e)de will not 


converge. By putting 


LJ 

[eleae=t 

-0 
or Af xpf-3e]æ=1, 
we have A= pad 


where h=V/)/2. 
Thus we have, finally, 


d(e)= Jl- —he*}. w. (A4) 


Hence the distribution of the error (e) is normal with mean 0 and 
standard deviation 1/V 2h. This h is sometimes called the index (or 
modulus) of precision, since the larger the value of h, the smaller is 
the value of the variance and the more precise will be the set of 
measurements. 


A3 Most probable value 

It can now be seen that if xy, xp, ...... » X_ be a set of equally 
precise measurements on a physical quantity whose true value is p 
and if the index of precision be h, then the maximum-likelihood 
estimate of u is 7. Here the likelihood function L is given hr 


Dp A? ew)" 


Hence In L=-W Z (n-p) +a constant independent of u, so that 


the likelihood equation, ont =0, gives 


pwd. ses (A5) 
Thus z is the maximum-likelihood estimate of u. It has been 
called the most probable value of u in the theory of errors, 


ELEMENTARY THEORY OF EREOBS 619 


If, however, the measurements zy Xg, tee , X%, are not equally 
precise and if their indices of precision be hy, hgs «+++ , ha, respectively, 
then the likelihood function is 

| AAN h 
L(y 2 — F h(x; —p)’]. 
quitte ge opl Fha 
Here the maximum-likelihood estimate of p is 
Dhesi 5 
i= Sh one (A ) 


Hence the most probable value of p is now a weighted average of 
the measurements, the weights being proportional to the squares of 
the indices of precision. 
On the other hand, suppose we take a linear function of observa- 
tions, say 
=a, H atg teres Hann 


Since zp % eee , ž„ are independently normally distributed 
3 1 
with common mean p and variances oi oie Ta z also 


will be normally distributed with mean f= ea and variance 
Sixt et sae +g (oide Exercise 15.11). If H is the index of 
precision of z, we have 


1 2 2 Ad 
mithat WA EA s. (å7) 
Thus z has the probability density 
J= epl- (AB) 
n 


Note that fi of (A6) is also a linear function, with 
a= hi] Zh Zaal. 
Hence the most probable value A is itself normally distributed with 
mean p and index of precision H, given by 
1 1 


HEA 


620 FUNDAMENTALS OF STATISTICS 


A4 Measures of error 
~~ To g.ve a general idea of the magnitude of errors in any given 
case, it is customary to employ one of the following measures : 


Error function 
The probability that an error lies between —x and +x is 
given by 
= 
Pi—-*<e<+sxJ= xa 


exp[—itet]de 
-2 [pirea 


ks 
= fpi- 


2 =$(hx), 
where 


w= f exp[—y"]dy. (A9) 


The function %(x) is called the error function of x and is sometimes 
denoted by erf (x). 


Root-mean-square error 
The root-mean-square error is the standard deviation ofthe error 
distribution. It is denoted by ø. Clearly, 
1 
Ln +» (A10 
arr “gil 
Probable error (PE) 

This is defined as a number y such that the probability that the 
error lies between —y and +y is equal to 4. From the table of 
normal distribution (i.e. Table I of Appendix B), it is seen that the 
probable error (PE) is given by 


PE=y=0-67450. w+ (All) 


ELEMENTARY THEORY OF ERRORS 62) 


Average error (7) 
The mean absolute deviation of the error distribution is defined 
to be the average error (q). Thus 


a Er fs 
7 gaf emt Met}de 


a) ag 
=z" exp[—he*]de 


bad : 
satin f exp[—y]dy, putting y= 
h T: 


Ye | a “ive 1 -=4/20=079720. a (A12) 
m 


SUGGESTED READING 


[1] Scarborough, J. B. Numerical Mathematical Analysis (Chs. 16, 17). 
Johns Hopkins, 1962, and Oxford Book Co., 1964. 

[2] Whittaker, E. and Robinson, G. The Calculus of Observations 
(Chs. 8,9). Blackie, 1944. 


B STATISTICAL 


TABLES 


N.B.—For an explanation of the terms and symbols used in the 
tables, the reader is referred to the following sections of the text ; 


1 
2. 


Section 10.15 (for Table I), 
Section 15.6 (for Tables II-V), 
Section 20.4 (for Tables VI-VIII). 
Section 20.6 (for Table IX). 


Section 20.8 (for Table X). 


TABLE I ỌORDINATES AND Arzas OF THB DISTRIBUTION OF 
Normat DEvIATE* 
a EEEEEIE EEE 


7 — o(7) Or) 7  ¢(r) Or) + or)» Of) 
aaie A eee 


2395511 8437524 


© 
2 


o (3989223 “5039894 51 -3502919 „6949743 
-5079783 52 3484925 .6984632 
03 3987628 5119665 53 7019440 


15 .3944793 5596177 65 .3229724 .7421539 


8 
te 
© 
S] 
5 
in 
$ 
% 
8 
n 
N 
= 
R 
Š 
N 
PA pet pet peeh p PEA pt D ok ponh poat pot ph pah ee pa pat 


eeasaarai 8 BRRRERR 
BRO HGUe) 
PPE E 
EÈ ANAN BReSeekkik 
ee 
BREASESSRURREER Rkbsatouron=sseqRaees 
E E 


21 3902419 


2732444 1.37 .1560797 9146565 


Bububhbhheebhhhhbbh 
a 
8 


a2 
gie 
ge 
ag 
ag 


REESEBSE 
S 
š 
3 


è 
8 
$ 
a 
Š 
> 


k t 
Š 2 
2 S 
$ 3 
Pe igio’pivicio’ots ein ninip pmb UNV SaR as 
x 
3 
3 
: 
a 
$ 
2 
$ 


be 
2 
S33 
epee 
$3 
g 
SE 
eet pt et Pt pat et t et et 


PES SS 
£ 
8 
> 


TABLE I (Contd.) 


z- A) ape) rt 9) Or) ot AO D) 


1.51 1275830 9. 2.01 .0529192  .9777844 2.51 0170947 .9939634 
1.52 .1256646 .9357445 2.02 0518636 9783083 2.52 0166701 .9941323 
1.53 .1237628 9369916 2.03 0508239 9788217 2.53 .9162545 .9942969 
1.54  .1218775 .9382198 2.04 .0498001 .9793248 2.54 0158476 .9944574 
1.55 121 .9394292 2.05 .0487920 .9798178 2.55 0154493 .99-16139 
1.56 1181573 .9406201 2.06 .0477996 .980300 2.56 .0150596 .9947664 
1.57 1163225 .9417924 207 | 98077 2.57 0146782 .9949151 
1.58 1145048 . 2.08 .0458611 .9812372 2.58 .0143051 .9950600 
1.59. 1127042 2.09 .0449148 .9816911 2.59 0139401 .9952012 
1.60 .1109208 .9452007 2.10 .0439836 9821356 2.60 .0135830 .9953388 
1.61 1091 9463011 2.11 0430674 .9825708 2.61 .0132337 .9954729 
1.62 1074061 .9473839 2.12 .0421661 .9829970 2.62 0128921 + .9956035 
1.63 1056748 9484493 2.13 .0412795 .9834142 2.63 -0125581 .9957308 
1.64 .1039611- 9494974 2.14 0404076 . 2.64 0122315 .9958547 
1.65 .1022649 .9505285 2.15 .0395500  .98 2.65 0119122 .9959754 
1.66 .1005864 .9515428 2.16 0387069 .9846137 2.66 .0116001 .9960930 
1.67 0989255 .9525403 2.17 0378779 .984 2.67 0112951 .9962074 
1.68 .0972823 .9535213 2.18 .0370629 .9853713 2.68 .0109969 9963189 
169 .0956568 .95 2.19 .0362619 .9857379 2.69 0107056 .9964274 
170. 91 9554345 20 .0354746 . 2.70 0104209 .9965330 
1.71 0924591 .9563671 2.21 .0347009 .9864474 2.71 0101428 99663 
1.72 70 .9572838 222 .0339408 .986; 2.72 0098712 .9967359 
173° 0893326 .9581849 2.23 .0331939 .9871263 2.73 58 .99683. 
1.74 .0877961 .9590705 2.24 0324603 9874545 2.74 0093466 .9969280 
1.75 ~ 0862773 .9599408 2.25 .0317397 .9877755 2.75 0090936 .9970202 
1.76 .0847764 .9607961 2,26 .0310319 4 2.76 5 9971099 
1.77 .0832932 .9616364 2.27 .0303370 .9883962 2.77 0086052 .9971972 
1.78 .0818278 .9624620 2.28 .0296546 .9886962 2.78 .0083697 9972821 
1.79 1 -9632730 229 ii 7 9893 2.79 0081398 9973646 
1.80 502 9641 2.30 0283270 .9892759 280 .0079155 .9974449 
1.81 5379 .9648521 2.31 .0276816 .9895559 281 5 9975229 
1.82  .0761433  .9656205 2 0270481 282 .007 9975988 
1.83 .0747663 .96637. 2.33 0264265 9 283 9976726 
1.84 0734068 .9671159 2 0258166. .9903581 284 0070711 .9977443 
1.85 9.967843 2.35 0252182 .9906133 2.85 9978140 
1.86 .0707404 .9685572 0246313 5 286 .0066793 .9978818 
187 9692581 2.37 0240556 .9911060 287 0064907 .9979476 
1.88 0681436 994 2.38 .0234910 .9913437 288 | 7 9980116 
1.89 11 .9706210 2.39 0229374 .9915758 289 0061274 .9980738 
1.90 0656158 1 2.40 0223945 9918025 2,90 25 9981342 
BOL 19334 2.41 10218624 . 7 2.91 0057821 .9981929 
1.92 .0631566 .9725711 2.42 .0213407 .9922397 2,92 160 .9982498 
1.93 .0619524 .9731966 243 .0208294 .9924506 2.93 0054541 .9983052 
1.94 .0607652 .9738102 2.44 0203284 .9926564 2.94 | 9983589 
1.95 .0595947 9744119 2.45 .0198374 9928572 2.95 .0051426 .9984111 
1,96 0584409 .9750021 2.46 .0193563 9930531 2.96 0049929 9984618 
1.97 .0573038 9755808 247 0) 9932443 2.97 .0048470 9985110 
1,98 0561831 9761482 2.48 0184233 .9934309 2.98 .0047050 .9985588 
1.99 .0550789 .9767045 249 .0179711 9936128 2.99 .0045666 9986051 
2.00 .0539910 .9772499 2:50 .0175283 9937903 3.00 0044318 9986501 


TABLE I (Contd.) 


ee ae el E RAEE 
+ g(r) Or) > or) Or) ot Hl) OH) 
E SAY, 008) Scie eae SO 


301 . 3.21 0023089 .9993363 3.41 0011910 
3.02 .0041729 .9987361 322 0022358 .9993590 3.42 0011510 
3.03 .004 9987772 3.23 .0021649 .9993810 3.43 0011122 
3.04 0039276 .9988171 3.24 9994024 3.44 0010747 
3.05 .0038 3.25 0020290 9994230 3.45 0010383 
3.06 .0036951 9988933 3.26 .0019641 9994429 3.46 0010030 
3.07 .0035836 .9989297 3.27 0019010 9994623 3.47 0009689 
3.08 .0034751 .9989650 3.28 0018397 9994810 3.48 0009358 
3.09 .0033695 .9989992 3.29 0017803 999499) 3.49 0009037 
3.10 00321 9990324 3.30 0017226 9995166 3.50 0008727 
3.11 .0031669 .9990646 3.31 .0016666 9995335 3.51 

3.12 9990957 3.32 0016122 99954! 3.52 0008135 
3.13 .0029754 9991260 3.33 0015595 9995658 3.53 0007853 
3.14 .0028835 .9991553 3 0015084 9995811 354 0007581 
3.15 0027943 1836 3.35 .0014587 .9995959 3.55 0007317 
3.16 .0027075 9992112 3.36 0014106 9996103 3.56 .0007001 
3.17 0026231 78 3.37 .0013 9996242 3.57 0006814 
3.18 .0025412 9992636 3.38 0013187 9996376 3.58 0006575 
3.19 .0024615 3.39 12748 9996505 3.59 0006343 
3.20 .0023841 .9993129 3.40 0012322 .9996631 3.60 0006119 


9996752 
9996869 


*Abridged from Table 1 of Biometrika Tables for Statisticians, Vol. I, with the 


kind permission of the Biometrika Trustees. 


TABLE II STANDARD NORMAL DISTRIBUTION 
Values of 7." 


625 


ra (1)—40 


TABLE III xX?’-DISTRIBUTION* 
Values of X}, 9 


bad 0.995 0.99 0.975 0.95 0.05 0.025 0.01 0.005 


I 0. 0.000 0.001 0.004 6.635 7.879 
2 0.010 0.020 0.051 0.103 5.991 7.378 9210 10.597 
3 0.072 0.115 0.216 0.352 7.815 9.348 11.345 12838 
4 0.207 0.297 0.484 711 11.143 13.277 14.860 
5 0412 0554 0831 1145 11070 12832 15 16.750 
é 0.676 0872 1237 1635 12.592 14.449 16.812 
7 0.989 1239 1690 2.167 6.013 18475 20.278 
8 1344 1646 2180 2733 15.507 17.535 2009 21 955 
9 1735 2088 2700 3325 16919 19.023 21666 23 589 
w 2156 2.558 3247 390 18 20.483 23209 25.188 
u 2603 3.053 3816 4.575 19.675 21.920 24725 26.757 
2 3.074 3.571 4404 5226 21 23.337 26.217 28.300 
3565 4.107 5009 5892 22.362 24.736 27.688 29.819 
M 4075 4660 5.629 6.571 23.685 26.119 29.141 31 319 
15 4601 5.229 6262 7.26 24.996 27.488 30578 32.801 
6 5142 5.812 6.908 7 26.296 28.845 32.000 34.267 
Vv 5697 6.408 7.564 8672 27.587 30.191 33.409 35.718 
8 6265 7.015 8231 28.869 31.526 $7,156 
d 6 7.633 8.907 10.117 30.144 32852 36.191 38.582 
a 7,434 8260 9.591 1085 31.410 34.170 37.566 399. 
a 8.034 8897 10:283 11.591 32.671 35.479 38.932 41.401 
zr 8.643 ` 9.542 10.982 12.338 A 36.781 40.289 42.796 
3 9.260 10.196 11. 13.091 35.172 38.076 41.638 44.181 
M| 9886 10856 12.401 13.848 36.415 39.364 42.980 45.558 
3 | 10.520 11.524 13.120 14.611 37.652 40.646 44.314 46.928 
æ | 11.160 12.198 13.844 38.885 41.923 48.290 
g| il 14.573 16.151 40.113 43.194 46.963 
Æ| 12461 13.565 1 41.337 44.461 48.278 50.993 
æ j| 13.121 16.047 17.708 42.557 45.722 49.588  §2.336 
W| 13.787 14953 16.791 18493 43.773 46 979 50892 53.672 
40} 20.706 22.164 509 55.759 59.342 63.691 7 
s0 | 27.991 29.707 32.357 764 67.505 71.420 76.154 79.490 
é| 35.535 37.485 3.188 79.082 298 88.379 91.952 
1 48.758 51.739 90.531 95.023 100425 104.215 
go | 51.172 53.540 57.153 60.391 101.879 106.629 112.329 116.321 
e| 59.196 61.754 65.647 69.126 113.145 118.136 124.116 128.299 
wo! 67.328 70.065 74222 77.929 124.342 129.561 135.807 140.169 


ee a 
For larger values of -», the variable J2x?—2v—1 may bs used as a 


normal variable. ) , 
*Abridged from Table 8 of Biomsrika Tables for Statisticians, Vol, I, with 
the-kind permission of the Biometrika Trustees. 


626 


TABLE IV t-DISTRIBUTION* 
Values of ta, y 


a 
td 
S 
uo 
o 
8 
o 
5 
5 
2 
S 


1 6.314 12.706 31821 63.657 
2 2.920 4.303 6.965 9.925 
3 53 31 4.541 5.841 
4 2.776 3.747 46% 
5 2015 | 2.571 3.365 4.032 
$ 1 2447 3.143 3.707 
1.895 2365 2.998 3.499 
8 1860 2.306 2896 3.355 
9 1833 2262 2821 3.250 
10 1.812 2228 2.764 3.169 
11 1.795 2201 2.718 3.106 
12 1.782 2179 2.681 3.055 
13 1.771 2160 2.650 3.012 
14 1.761. 2145 2624 2.977 
15 1.753 2.131 2602 2.947 

j 

| 


16} 1746- 2.120 2.583 -~ 2.921 
i7 | 174 2110 2567 2898 
18| 1734 2101 2552 2878 
19} 1729 2093 2539 2861 
20{ 1725 20% 2528 2.845 
211 1721 2080 2518 2831 
22) 4717 2074 2508 2319 
23| 1714 2069 2500 2807 
24 1.7łi 2.064 2.492 2.797 
25| 1708 2060 2435 2787 
26! 1.706 2056 2479 2.779 
27 |- -1703 2052- 2473- 2.771 
22| 1701 2048 2467 2763` 
29| 169 2045 2462 2.756 
30| 1697 2042 2457. 2750 
soj ica zor 2423 2.704 
60 | 1671 2000 2390 2.660 
120: 1658 1980 2358 2.617 
wo | 1645 1960 2326 2576 


_ *Abridged from Table 12 of Biometrika Tables for Statisticians, Vol. I, with the 
kind permission of the Biometrika Trustees. 


F-DisTs1eurion* 


Values of F.os:v1» v2 


TABLE V 


120 


| 


saRSRARARSANIAANS 


ee et et rt tt ot 


PERSE ERED 


1 


eee RSRIESN 


pas ee es ee ee 


oe as i 243 


ens ms ot t rer! 


SRIRVBRSARS 


8.64 862 859 857 8.55 
577 5.75 5.72 569 5.66 
453 450 446 443 440 


Madidadsea- = 

SEAE REEDE EAR A hhk AR httk 

PES Mish LL ii bbe Sb ket 
EEC ERES EEE AAS iby § 

Ses E E E EE RIRE H NENNE 

PRES E HER EE RRRA KE HR 

3 

R 

4 

3 


596 591 586 580 
4.68 
4.00 3.94 
3.57 3.51 
3.28 ` 
307 3. 
291 
2.79 
2.69 
2.60 
2.53 
2.48 
2.42 
231 


238 


881 


Ñ 
d 


K 
Sidadeidinwa= 
RIANNA NA 
ISIAN 


n 
c 


d 
s 
A 
d 


19.30 19.33 19.35 19.37 19.38 19.40 19.41 19.43 1945 19.45 19.46 19.47 19.48 19.49 


a 
“n 
wW 
+ 
(X 


PERE EREB EEN EE ERREN H 
PRERE K hikt E Ettik a 
3 


~ 
nl 
N 
& 
n 
”» 
+ 


3 


© 237 221 


BERD EEL DERE EL BBE bit 
Gereot eht Ehk sae 
gaadassS99IS9399399938933 


ANOS MO~MNSANALALAAARNARKSSSE 8 


2 


N 
> 
ug] 
N 
2 
as 
Lg 
an 
25 


Q 
ai 
3 
SRS 
© 
mte bete kekah 
2323399 


= 
S 
“ 
ci 
a4 


3. 
3 
3 


161.4 1995 215.7 2246 2302 234.0 2368 2389 240.5 241.9 243.9 245.9 248.0 249.1 250.1 251.1 252.2 2533 2543 


1851 19.00 19.16 
10.13 955 928 912 9.01 


771 


ependent variables, 


», and ra, one may use linear interpolation, taking I/v; and lv, as the ind 


Por other values of 


ee saai raa a a eee te ee ee ee ee ee ee ee ae a ee 


zl I 6ST OLT G6LT Bt wz sz wz We rsz we Be We we sce 19r o> ba 
SI 99T Ii OBI SOL OZ Z HZ Le SZ WZ 6L2 WZ AT € 8e SOE 6LF SED = 
ELI Sl ót eoz BZ ozz sez osz WZ Wwe We se ee vee SOS Sth 86h 80% 

zt zoz We OZ 6f2 ¿ez sZ We OSZ O82 667 ZEE Sore ISE BE ier Sts Ics | 
irz izz og OZ ¿yg sez ocg vez Bz WE LIE OE LE WE WY iSt 68S 952 | 0 
Wz ozz sez bzo WZ oz SLZ OFZ LOS AE SZE HE LSE SLE Dy iv SS we | 
gzz eez Wwe OFZ Bz WZ Be We OOS BIE ce WE OSE WE wh Wh BSS eee 3 
tez OFZ Ghz RZ 992 Pz O82 OOS LTE ØE HE OSE LIE WE WY wey ISS ZZ 

Oz osz Sz “92 Sez WE Bz re IE SEE SHE OSE ME os If Wh ws SL |Z 
wz oz 692 MZ S8z z ODE TE LE WE WE OLS BE Oly wr wer SgS org | 
B52 £92 MZ WZ we WE STE OS WE we We LE roc Lv Ost 10S 6S BIS | OF 
92 SLZ HZ U2 WE WE ETE LEE Ise OOF IZE WE Wh Sth 8Sp os 109 678 | Bt 
sez z cz OOS WES ME Ike WE OFF BE GLE EF Ory ty Oy sis 19 O8 |Z 
Wz Sez WE OTS gre SE We SSE SOE BLE GRE Wr r vey ily 62S S29 SFB |N 
WZ SOT SIE IZE OCS LEE zs ¿9e oge GBS Oh piy Zr Wh Gh WS HO 88 st 
Of STE ¿ze SEE we ISS WE OBE POE Wr Hh Br Wr Wb ws SS 159 WS | Ht 
STE we we Ise ese goe «CBE E O» «GIy Or wb r Br IF he S 029 406 EL 
Sre pe zoe OLE BLE OFS Ior Oy r ocr Sr HIP ær WS Trs S6S £69 c6 | 2 
OS BS WE WE Wh Ob Sze Oh SH WH wr Bh LOS BS LIS wd 1ZZ $96 It 
Or Wr Liv Sch ey Ivy Oy IP Seb per wS «(OCS «GES HS S «SSD QSL WOT | OI 
Oy She Loh Soy tly Ish Mwy It S zs ses dys 19S O85 W9 z9 669 WB FOL 16 
Sov WS ZTS OCS BCS S ZTS LIS ZS 16S 09 819, 29 99 W04 65% S98 IIT |B 
eS @S IES 66S L09 919 129 29 299 zo $89 669 GTL WL SFL SHB ss SZZI |é 
9 We WL FL TIEL WL WL ZL LBL WL O18 ØB Lrg SLB STG 866 GOI SLET 9 
iló 076 676 86 4V6 S36 226 686 SOOL TOL 6GZOL MOI FOE L601 GEI WZI wet 99 |S 
OSET SOET SLET PRET GST ZI OZEL Zebl SSH MPT OFHI LOFT IZSE ZSSI QST FOL ORI OZIZ | ¥ 
ZZZ ZEN IPOS OSI O99 O99Z L892 Soze ezaz Seiz oriz LIL I6LZ Weg IL8Z groz ZOE ZIE |E 
666 B66 LY66 LY66 966 SH'66 EGG 7266 O66 6L6G6 LEGG 9F66 ELGG O66 S766 L66 0066 0586 |Z 
69 eI 7829 1929 SEZ 6029 ZSI9 WI9 9509 2209 Z86S ZGS GSBS POLS -SZ9S COPS $666> ZSO |1 


sa 
09 Or og z o SI a or 6 8 2 9 ¢ + £ z 1 KA 


p aa 


SacTatroy fo sanjog 
(puo) A TIGVL 


TABLE VI CUMULATIVE BINOMIAL PROBABILITIES OF 7f OB 
Fewer Svcdgsszs IN n INDEPENDENT TBIALs 


with p=0.5 


When r>n/2, use the fact that ulati - 
dep soshebtiiy brier rei at cumulative probability for r equals 1— cumula- 


TABLE VII Carrroat Vatons or T ix THe WILOOXON 


Stanzp-Rasx Tzst* 


Significance level of test 

n One-tailed : 0.025 0.01 9.005 
Two-tailed : 0.05 0.02 0.01 

6 1 
7 2 0 
8 4 2 0 
9 6 3 2 
10 8 5 3 
11 n 7 5 
12 14 10 7 
13 17 13 10 
14 21 16 13 
15 25 20 16 
16 30 24 19 
17 35 28 23 
18 40 33 28 
19 46 38 32 
20 52 43 37 
21 59 49 43 
22 66 56 49 
23 73 62 55 
24 81 69 61 
25 90 77 68 
: gee SPE 
* Adapted from Tabie 2 of Wilcoxon, F, and Wilcox, R. A. (1964) : Some 


pproximate Procedures, Lederle Laboratories, American 
fea River, New York, with the kind permission of the authors and the publishers. 


oneowr 6 NN 


TABLE VIII Carrioa, VALUES or U 10B THE 


Mann-Wuitngy Test* 


(a) Significance level 0.01 for a one-tailed test and 0,02 fora 
two-tailed test : 


632 


20 


107 
114 


TABLE VIII (Contd.) 
(b) Significance level 0.05 for a one-tailed test and 0.10 fora 
two-tailed test : 


1 0 0 
2 l 1 1 2 2 2 3 3 3 4 + 4 
3 3 4 5 5 6 7 7 8 9 9... 105 788 
4 6 7 8 9 1 12 14 67 WG 37 18 
SP or i oo 13 1 36. 4 1 “39's 290 “Ee” eae ee 
6 2 4 16 17 19" Si }° 29° F255 226" F2E~ > AD) Se 
7 15 97 21 24 96.28" "90! “SR 85:97 98 
8 1S 98" (36. N 81 8S SG" SO. A a aT 
s or e Sa m a 36 So 89. AE r 48> St 54 
10 2 27 ~~ Sl $4 37 #41 4 48 «51 55 «58 «62 
il 97 631 (oe) 6984 46 SAT” GLC CD 
12 so" $4 se 479° G1 7,55" “GO Ga Ga ra PT 
13 on 37. 4428s 747 SI 56 61! "OS a0” =75° 80) TOA: 
14 36° 41 “46 «(SI 56 61 66 71 a2": “ORY TS. Se 
15 99 44 50 55 61 66 “72 %77 83° 88 9+ T00 
16 42° 48° 54 e Tean A: 77's 8 897 955 L 07 
17 asi $1 #57. 6 70 77 “83 89 96 102 109 115 
18 48 55 61 68 °75 82 88 95 102 109 116 123 
19 51 58 65 7% 80 87 94 «101 109 116 123 130 
20 54 62 69 77 84 92 100 107 115 123 130 138 


*Adapted and abridged from Auble, D. (1953): “Extended tables for the 
Mann-Whitney statistic’, Bulletin of the Institute of Educational Research ot Indiana 
University, 1, No. 2, with the kind permission of the euthor and the publishers. 


633 


TABLE IX Crimoat Vatugs or rın tme Ron Txst* 
Significance level 0.05 


Nel s4s 6729 10 0 2 13 ik 15 16 17 8 19 20 
my 
= a PAIS Se Se STO Se: 


2 Tiar PNP awe etl oS SEES Tay aa | 
3 SREP Re 28 Oe Fae ges ggg 
4 262.259 Sv S58" Bers Sr 8:4. a 4 
5 S258 5S g8 Bip St ONS ok E 4S a 
6 22 8,3 Er 4. 4. # 4a A. SS, HE 6 
7 402 3.3 3.44.5 5 S.5 5 6 6 6°66 -6 
8 S98 5 3p9 At 55 BS Ope GG. by 6 2. FT 
9 Rpa Spt 4S “Se 86) 66" iF ~ Fe 7 oP By BO, a 
„10 259 Dead E LET E A 7675 S E tha BR, 4D 
AL ed iS OSS) BB 87S TA? Bo Bie BD “Beeld od 
12 R203 456° $6 6a 77 Fon 8 BB) 9. 9 9:10 10 
3) 2234556677 8 8 9 9.9 10 10 10 10 
bee 2848 526 Te? a aaa A Ha10 10. 10-11. 10 
56/23345667788 9 90 On ET 1 
6123445667 8 8 9 9 10 1011 1 i a2 12 
7) 23445677 8 9 9 10 10 11 11 U 12 12 13 
1%] 23455678 8 9 9 10 10 11 1) 12 12 13 13 
19/23 456678 8 9 10 10 11 11 12 12 18 13 43 
20; 23456678 9 9 10 10 1 12 12 19 13 13 14 


*Adapted from Swed, F. S. and Eisenhart, C. (1943): “Tables for testing 
randomness of grouping in a sequence of alternatives”, Annals of Mathematical 
Statistics, 14, pp. 83-86, with the kind permission of the authors and the editor. 


TABLE X Carrtiost VALUES oF rr, THE SPEARMAN 
RANK CORRELATION COEFFICIENT* 


| Significance level (one-tailed test) 
ae a arr 


0.05 0.01 

4 1,000 

5 -900 1,000 

6 829 $ 

7 714 893 

8 643 833 

9 600 -783 
10 564 746 
12 506 712 
14 456 645 
16 425 601 
18 399 564 
20 377 534 
22 -359 -508 
24 343 485 
26 329 465 
28 317 448 
30 +306 432 


*Adapted from Olds, E. G. (1938) : “Distributions of sums of squares of rank 
pagers E for small numbers of individuals”, Annals of Mathematical Statistics, 
9, pp. 133-148, and from Olds, E. G. h, (1949) “The 5% significance levels for 
sums of squares of rank differences and a correction", Annals of Mathematical 
Statistics, 20, pp. 117-118, with the kind ponte Bene of the author ol the editor. 


635 


Alternative hypotheses, 453-454 

Anderson, T, W., 152 

Array distribution (see conditional distri- 
bution) 

Array mean, 351-352, 357-360 

Array variance, 352 

Arthashastra, 152 

Association, 124, 310-311, 351 

absolute, 311, 313 

and causal relationship, 322-323 

coefficient of, 312 

complete, 311, 312 

joint, 320 

measures of, 125, 312-314, 318-322 

multiple, 320 

negative, 311 

partial, 321 

perfect, 311, 317 

positive, 311 

— total, 321 

Asymmetrical distribution, 252 

Asymmetry (See skewness) 

Attribute, definition, 182, 308 

Attributes, independent, 310, 315, 319 

Average, 205, 215-216 

Average error, 621 


bis by coefficients, 256 

Band chart, 169-170 

Bar diagram, 170-174 

Bernoulli, J., 102, 132, 268 

Bernoulli's theorem, 132 

Beta function, 30-32 

Binomial distribution, 268-277, 291-292 
— cumulative probability of, 302 
— fitting of, 273-277 
— moments of, 270-272 
— recursion relation concerning 

moments of, 272-273 
Binomial series, 8-9 
Biserial correlation, 409-411 


INDEX 


Bivariate data, 330-332 

Bivariate frequency distribution, 332 
Bivariate interpolation, 70-72 
Bivariate normal distribution, 351-353 
Bose, R. C., 154 

Box, G. E. P., 58) 


Calcutta University, 153 
Cauchy-Schwarz inequality, 11-12 
Central limit theorem, 544 

Central moments, definition, 242, 264 
Central Statistical Organisation, 153 
Central tendency, definition, 204-205 
Charlier checks, 245-246 


*Chebyshev’s inequality, 130 


Chebyshev’s Jemma, 129-130 
Chi-square dis‘ribution, 428-431 
Class boundaries, 191-192 
Class interval, 191 
Class limits, 191 
Cochran, W. G., 152, 562 
Coefficient of colligation, 312 
Coefficient of contingency, Karl Pearson’s, 
317 
— Tschuprow’s, 317 
Coefficient of variation, 236 
Co-factor, 16 
Collection of data, 157-160 
Column diagram, 193-194 
Combination of tests, 530-531 
Component-parts chart (see band chart) 
Compound probability, 110, 113 
Concentration, area of, 237 
— coefficient of, 237 
— curve of, 236-237 
Conditional distribution, 309, 332, 350- 
351 
Conditional probability, 109-111 
Confidence coefficient, 450-451 
Confidence limits (interval), definition, 
450-451 


637 


638 FUNDAMENTALS 


Consistency, 443 
Convergence in probability (stochastic), 
129-132, 443 
Convergence of iteration method, 86 97 
— ofNewtoneRaphson method, 87-88 
Corner test for association, 605-607 
Correlation, 333-334 
Correlation cocflicient (simple), defini- 
tion, 125, 334-336 
— limitations ef, 353-355 
— properties of, 336-338, 353 
Correlation index, 355-357 
Correlation ratio, 357-360 
Counting numbers, 5 
Covariance, 125, 127, 335-336 
Critical region, 455-457 
— UMP, 456 
— UMPI, 457 
— unbiassed, 457 
Cumulative frequency, definition, 188, 192 
Cumulative frequency diagram, 196-197 
Cumulative pr¢portion, definition, 188 


Darwin, 149-150 
Data, frequency, 162-163 

— non-frequency, 162 

— primary, 157 

— secondary, 156 
Degrees of frecdom, 428, 431, 433, 459 
De Morgan, 101 
De Moivre, 149 
Derivative, 22-25 

— partial, 23 

— application of, 23-25 
Determinants, 15-17 
Diagrammatic representation of data, 

166-178 
Differential coefficient (see derivative) 
Dirichlet integral, 33-34 
Disassociation, 311 
Dispersion, definition, 224225 
Distribution-tree procedures, 580, 600- 
Gol 

Distribution, types of, 198-201 
Divided-bar diagram, 176 
Divided differences, 60-62 
Dot diagram (see scatter diagram) 


OF STATISTICS 


Edgeworth, F. Y., 295, 297 
Effective range of normal variable, 290 
Efficiency, 443-444 

Elements of a matrix, 13 

ofa set, 5 

Error, absolute, “7 

different types of, 37-40 
in a tabular value, 43-45 
percentage, 37 

relative, 37 

Errors I and II 455-456 

Error function, 620 

Event(s), certain, 102 
complement of, 100 
difference of, 100 
elementary, 99 

equally likely, 101 
exhaustive, 105 
impossible, 102 
independent, 112-113 
intersection of, 100 
mutually exclusive, 104-105 
union of, 99 

Exponential series, 9 
Extrapolation, 54 


F-distribution, 433-435 
Feller, W., 152 
Finite differences, definition, 42-43 

— leading, 42 
Fisher, R. A., 150-151, 445, 486, 488 
Fisher-Behrens problem, 487-494 
Fisher's ¢ test, 486 
Fractile (see quantile) 
Frequency curve, 197-198 
Frequency, definition, 117, 183 
Frequency bar chart (see column diagram) 
Frequency chi-square, 560-562 , 
Frequency density, definition, 192 
Frequency distribution, of attribute, 

183-185, 

— ofvariable, 185-193 
Frequency polygon, 194 
Function, concept of, 20 

— continuous, 22 

— discontinuous, 22 

— limit of a, 20 


—_—t 


INDEX 


Function (contd.) 
— maximum ofa, 23-25 
— monimum ofa, 23-25 
— rational integral (or polynomial), 
41 
— stationary value of a, 24 


£v 8g: Coeflicients, 252, 255 
Galton, 1 0 

Gamma function, 31-34 

Gauss, K. F., 286, 615 

Geary’s ratio, 537 

Geometrical probability, 114-117 
Gini, C., 233 

Gosset, W. S., 150-151 J 
Grade correlation, 400-401 
Gram-Charlier series, 295, 297 


Histogram, definition, 194-195 
Historical data (see time series data) 
History of statistics, 147-154 

— inIndia, 152-15¢ 
Homoscedasticity, 352, 487 
Hypergeometric distribution, 265-267 
ad 451-453, 458-459 

alternative, 452-453 

— composite, 458459 

— null, 451 

— simple, 458-459 


Inaccuracies, different types of, 35-36 
Index of precision, 618-619 
Indian Statistical Institute, 153-154 
Inequalities, 10-12 
Infinity, meaning of, 21 
Integers, positive and negative, 5-6 
Integral, definite, 25-26 
— double and multiple, 28-30 
— indifinite. 27-28 
— infinite (or improper), 26-27 
— transformation of, 29-30 
Inter-class correlation, 402 
Interpolation, definition, 41 
Interpolation formula, Bessel’s, 65-67 
— central difference, 63-68 
— Lagrange’s, 55-58, 61 
— Laplace-Everett, 67-68 


639 


Interpolation formula (contd.) 
— Newton's backward, 51-52 
— Newtohi's divided difference, 62 
— Newton's forward, 50-51 
— Newton-Gauss backward, 64 
— Newtori-Gauss forward, 63-64 
— remainder term in, 69-70 
— Stirling's, 65 
Interval estimation, 449-451, 582 
Intra-class correlation, 402-406 
— population, 407 
Invariance, 449 
Inverse interpolation, 56 
Towa State College, 152 
Irrational numbers, 7 


Jacob‘an, 29-30 
Joint distribution, 123-124, 3¢8, 350 


Kempthorne, O., 15 
Kendall, M. G., 397 
Kolmogorov, A. N., 119, 152, 599 
Kolmogorov-Smirnov tests, 599-601 
Kurtosis, 255-256 

— measure of, 255 


Lagrange multipliers, 25 
Laplace, 102, 149, 615 
Large-sample approximations, 543-545 
Large-sample tests aud confidence inter. 
vals for means, 545-546 
— for Poisson parameter, 550-552 
— for proportions, 546-548 
— for standard deviations, 554 
556 
Laws of operations, 14, 101 
— associativity, 14, 101 
— commutativity, 14, 101 
— distributivity, 14, 101 
— idempotency, 101 
Least-square method, 344, 367 
Leptokurtic curve, 256 
Level of significance, 452, 455 
Likelihood-ratio tests, 459-463 
Limits, properties of, 21 
— some important, 22 
Line diagram, 167-170 


640 


Linear equations, 18-19 
— consistent, 19 
— Cramer's rule for solution of, 19 
— homogeneous, 18 
— non-homogeneous, 18 
— non-trivial solution of, 19 
Linear transformation, 18 
Logarithmic series, 9 
Log-normal distribution, 295-297 
Logarithmic transformation for sample 
variance, 558 
Lorenz curve (see curve of concentration) 


Mahalanobis P. C., 153-154 
Mann-Whitney U test, 590-591 
Marginal distribution, 124, 309, 332, 
350, 352 
Matrix, 12-15, 17-18 
— diagonal, 14 
— equal, 13 
— inverse, 17 
— non-singular, 17 
— orthogonal, 15, 18 
— rank of, 17 
— singular, 17 
— square, 13 
— symmetric, 13 
Maximum-likelihood method, 446-449 
Mean, arithmetic, 205-207, 218 
— geometric, 215-216, 218 
— harmonic, 216-217, 218 
Mean deviation, 225-228 
Mean difference, 233-234 
Median, 208-210, 265 
Median test, 592-594 
Members of a set (see elements of a set) 
Mendel, Gregor, 146, 149 
Mesokurtic curve, 256 
Method of moments, 273-274, 282, 292 
Minimum-variance unbiassed estimator, 
442-443 
Minor, 16-17 
Missing term, 47 
Mode, 211-213, 265 
Moments, about an arbitrary origin, 243- 
248, 264-265 
— central, 242-248, 264-265 


FONDAMENTALS OF STATISTIOS 


Mood's rank test for dispersion, 594-595 
Most probable value, 618-619 
Multicollinearity, 527 
Multiple-axis chart, 170 
Multiple-bar diagram, 171-174 
Multiple correlation, 373-376 
— in terms of total and partial 
correlations, 381-382 
Multiple regression, 366-370, 375-376 
Multivariate data, 366 
Multivariate normal distribution, 386- 
388 


National Sample Survey, 153 
Negative binomial distribution, 284-285 
Neyman, J., 151-152, 155, 455, 489 
Nonparametric methods, 580-581 
Normal deviate, definition, 289, 425-427 
Normal distrlbution, 286-295, 300 

— fitting of, 292-254 

— importance of, 294.295 

— properties of, 287-290 . 
Normal law of errors, 615-618 
Numerical differentiation, 72-73 
Numerical integration, definition, 74 

— Gregory’s formula, 94 

— Euler-Maclaurin formula, 80-81 

— relative accuracy of quadrature 

formule, 77-79 

— Simpson’s one-third rule, 75-76 

— Simpson’s three-eighths rule, 94 

— trapezoidal rule, 74-75 

— Weddle's rule, 76.77 
Numerical solution of equations, defini- 

tion, 82-83 

— Horner’s-method, 88-91 

— iteration method, 86-87 

— method of false Position, 83 

— Newton-Raphson method, 84, 87 


Ogive (see cumulative frequency diagram 
Operators, 45-49, 92 } 
— 4, E, 45-49 
a., 8, #, 92 
Order statistic, 581 
Orthogonal transformation, 18, 435-436 
Outlier, 582 


INDEX 


Paired t-test, 497-498 
Parameter, definition, 265-266, 273, 417 
Parameter-free procedure, 581 
Partial correlation, 376-379 
— of higher order in terms of coeffi- 
cients of lower order, 384 
— of lower order in terms of coeffi- 
cients of higher order, 385 
Partial regression coefficient, 370, 379-380 
Pearson, E. S., 151, 455 
Pearson, Karl, 150-151, 297, 317, 561 
Pearsonian chi-square (see frequency chi- 
square) 
— test for goodness of fit, 562-566 
— test for homogeneity, 566-568 
— test for independencc, 568-570 
— simplified formulz, 570-572 
Pearsonian curve, points of inflection of, 
300 
Pearsonian curve, type I, 298 
— type II, 300 
— type III, 299 
— typelV, 299 
— type V, 299 
— type VI, 299 
— type VII, 300 
Pearsonian differential equation, 297-298 
Pearsonian system of curves, 297-300 
Pictorial diagram, 174 
Pie diagram, 176-177 
Platykurtic curve, 256 
Point estimation, theory of, 441-446 
Point of inflection, 24 
Poisson distribution, 277-283, 291-292 
— cumulative probability of, 302 
— fitting of, 282-283 
— moments of, 279-281 
— recursion relation 
moments of, 281 
Population, definition, 260 
Power function, 456 
Prediction limits, 505, 526 
Presentation of data, 163-178 
Presentation of data, diagrammatic, 166- 
178 
— tabular, 164-166 
— textual, 163-164 


rs (1)—41 


concerning 


641 


Presidency College, Calcutta, 153 
Probability, axiomatic approach, 117- 
lig 
— classical definition, 101-102, 114- 
117 
Probability-density function. 261-262 
Probability in continuum (see geometrical 
probability) 
Probability-mass function, 260-261 
Probability, meaning of, 98 
Probabiliy sampling, 415 
Probable error, 418, 620 
Product notation, 4-5 
Publicistics, 149 


Quadratic form, 18 — 
— non-negative, 18 
— positive definite, 18 
— positive semi-definite, 18 
— rank of, 17 
Quantile, 234, 581 
Quantile test, 609 
Quartile, 234 
Quartile deviation, 234-235 
Quetelet, 149 


Random order, 103 
Random sampling, 106, 112, 128, 415- 
417, 422-424 
Random sampling from a probability 
distribution, 422-424 
— simple: with replacements, 128, 
416 
— simple: without replacements, 
128, 416-417 
Random variable, 119-129, 260-262 
Rank, 392 
Rank correlation coefficient, 392-400 
— Kendall's, 397-400 
— Spearman’s, 392-395, 603-604 
Range, 225 
Rao, C. R., 152, 154 
Ratic chart, 167-169 
Rational numbers, 6 
Real number system, 5-7 
Rectangular distribution, 285-286 
Regression, 343-348, 351-352, 355-356 


642 
Regression coefficient, 344-345, 351, 355- 
356 


77-79 


Relative dispersion, 235-236 


Rounding off, 36 
Roy, S. N., 152, 154 
Run test for randomness, 602-603 


Sample, 260 
Sampling distribution, definition, 417- 
418 
of sample mean from normal 
distribution, 435-437, 
of sample variance from normal 
distribution, 435-437 
with binomial variables, 424-425 
with Poisson variables, 425 
Sampling fluctuations, 214-215, 249, 
417 
Sankhyā, 153 
Scatter diagram, 333 
Scedasticity, 352 
Scrutiny of data, 160-162 
Semi-interquartile range (see quartile 
deviation) 
Semi-logarithmic chart (see ratio chart) 
Separation of symbols, 49 


FUNDAMENTALS OF STATISTICS 


— absolutely convergent, 8 


— empty (or null), 5 

Sheppard’s corrections, 248-250 

Sign test, 583-585, 587-588 

Significant figures, 37 

Signed rank test, 585-587, 588 

Sin? transformation for proportion, 
557 

Skewness, 199, 251-254 

Smirnov, 152, 599 

Smoking and lung cancer, 323-326 

Snedecor, G. W., 152 

Snedecor-Brandt formula, 571 

Spatial-series data, 162 

Spearman, C. 393, 603 

Square-root transformation for Poisson 
variable, 558 

Standard deviatian, 228-231 

Standard error, definition, 418 

for central moments, 553 

— for coefficient of variation, 553 

for correlation coefficient, 554 

for g1, gy coefficients, 553 

for mean, 419-421 

for median, 554 

for proportion, 422 

for quantiles, 554 

for s.d., 553 

for variance, 553 

Standard error of estimate, 347, 375 

Standard normal variable (see normal 
deviate) 

Statistic, definition, 417 

Statistics, a historical note, 147-152 

Statistics (plural), definition, 141 

— (singular) definition, 141 


Statistics, in agriculture, 145 
— in business and cemmeree. 144 


in India, 152-154 
in matters of state, 143-144 
in sciences, 145-147 


Statistics, is it a science ?, 141-143 
Statistical map, 177-178 ` 
Statistical quality control, 232 


Sufficiency, 445-446 

Sukhatme’s test for dispersion, 595 
Sum notation, 3-4 

Symmetrical distribution, 198, 251-252 


¢-distribution, definition, 431-432 
Tabular representation of data, 164- 


Test of significance, theory of, 455-459 
‘Tests of significance, relating to binomial 


distribution, 467-470 

bivariate normal distribution, 
496-500 

correlation (multiple), 520 
correlation (partial), 520 


— correlation (simple), 496-497, 
557 
i of two attributes, 
472-475 4 


means of more than two uni- 
variate norma] distributions, 512- 
517 

multiple regression equation, 
521-527 

normality, 535-538 

one univariate normal distribu- 
tion, 475-482 

outliers, 532-535 

Poisson distribution, 471-472 
regression, 500-507 

603-634 


two univariate normal distribu- 
tions, 481-491 


Test of significance (contd.) 


of Smith and Satterthwaite, 
490-491 

— ofWelch, 491 

variances of more than two nor- 
mal distributions, 517-519 


Tetrachoric correlation, 408-409 
Theoretical distribution, 260-265, 350- 


353, 386-388 


Tie, 394 

Time-series data, 162, 
Tolerance interval, 583 
Total correlation, 377 
Total probability, 111 


Unbiassed estimator of mean, 419-421 


of proportion, 421-422 
of regression coeficient, 501 
of variance, 449, 475 


Unbiassedness, of estimator, 442 


oftest, 457 


Variables, 183, 185 


continuous 185 

discrete, 185 

independent, 124, 551 
random (or stochastic), 119 


Variance, definition, 121-122, 127-128, 


242 


Variates (see variables) 
Vectors, 15 


column, 15 

linearly dependent, 15 
linearly independent, 15 
orthogonal, 436 

row, 15 


Wald, A., 152 
Wald-Wolfowitz run test, 596-598 


644 FUNDAMENTALS O? STATISTIOS 


Wallis’ formula, 94 Yates’ correction for continuity, 572-574 
Weak law of large numbers, 129-152 Yates, F., 572 
Weighted mean, 217-218 Yule, G. U., 150, 312 
Wilcoxon, F., 588, 589 
_ Wilcoxon rank sum test, 589-590 z-transformation for correlation cocfii- 


— signed-rank test, 568-589 cient, 557-560 


